admin管理员组文章数量:1345089
I am developing a secure P2P file transfer tool in C which is intended to be used for sending arbitrary-size files from between two machines running the program.
I have been trying to figure out what the currently best known techniques are for a versatile yet near-optimal approach for reading/writing large files.
The data flow is as follows
read() file chunk into buffer
|
v
encrypt chunk
|
v
compress chunk
|
v
write() to TCP socket
A couple of details I'd like to point out:
- The encryption (and compression) are done within the application since the protocol is built into the application; Therefore using something like kTLS does not apply here.
- I've been suggested to profile different I/O techniques before finalizing any one design. However, the project is at a very early stage right now for me to be able to do any such profiling. In fact, I am only partially done working on the client state machine, and a server which can understand these protocol-specific messages just does not exist yet.
io_uring
's difficult interface would add more complexity than I can track on the project right now, but maybe this can be a viable optimization/refactor later-on.
For reads, I am currently mmap()
ing in chunks of maximum 48 * PAGE_SIZE
for files >= 4 GB and a maximum chunk size of 24 * PAGE_SIZE
for everything smaller (these are completely asspull numbers that I just went with).
For writes, I just write()
to the file from within a loop that recieves data from the TCP socket.
I found a post from a 2003 mail thread between a few folks and Linus Torvalds where he says
Quite a lot of operations could be done directly on the page cache. I'm not a huge fan of mmap() myself - the biggest advantage of mmap is when you don't know your access patterns, and you have reasonably good locality. In many other cases mmap is just a total loss, because the page table walking is often more expensive than even a memcpy().
...
memcpy() often gets a bad name. Yeah, memory is slow, but especially if you copy something you just worked on, you're actually often better off letting the CPU cache do its job, rather than walking page tables and trying to be clever.
Just as an example: copying often means that you don't need nearly as much locking and synchronization - which in turn avoids one whole big mess (yes, the memcpy() will look very hot in profiles, but then doing extra work to avoid the memcpy() will cause spread-out overhead that is a lot worse and harder to think about).
This is why a simple read()/write() loop often beats mmap approaches. And often it's actually better to not even have big buffers (ie the old "avoid system calls by aggregation" approach) because that just blows your cache away.
Right now, the fastest way to copy a file is apparently by doing lots of ~8kB read/write pairs (that data may be slightly stale, but it was true at some point). Never mind the system call overhead - just having the extra buffer stay in the L1 cache and avoiding page faults from mmap is a bigger win.
Besides the fact that this thread is from 21 years ago, it also precedes the Spectre/Meltdown attacks and syscalls have gotten much more expensive since so minimizing syscalls is imperative.
To summarize this post with all the above context, how should I approach desiging the I/O interface for my application, when I don't have the means to profile at this stage? What would be my best bet? And where can I learn the nifty I/O tricks used for such uses?
I am developing a secure P2P file transfer tool in C which is intended to be used for sending arbitrary-size files from between two machines running the program.
I have been trying to figure out what the currently best known techniques are for a versatile yet near-optimal approach for reading/writing large files.
The data flow is as follows
read() file chunk into buffer
|
v
encrypt chunk
|
v
compress chunk
|
v
write() to TCP socket
A couple of details I'd like to point out:
- The encryption (and compression) are done within the application since the protocol is built into the application; Therefore using something like kTLS does not apply here.
- I've been suggested to profile different I/O techniques before finalizing any one design. However, the project is at a very early stage right now for me to be able to do any such profiling. In fact, I am only partially done working on the client state machine, and a server which can understand these protocol-specific messages just does not exist yet.
io_uring
's difficult interface would add more complexity than I can track on the project right now, but maybe this can be a viable optimization/refactor later-on.
For reads, I am currently mmap()
ing in chunks of maximum 48 * PAGE_SIZE
for files >= 4 GB and a maximum chunk size of 24 * PAGE_SIZE
for everything smaller (these are completely asspull numbers that I just went with).
For writes, I just write()
to the file from within a loop that recieves data from the TCP socket.
I found a post from a 2003 mail thread between a few folks and Linus Torvalds where he says
Quite a lot of operations could be done directly on the page cache. I'm not a huge fan of mmap() myself - the biggest advantage of mmap is when you don't know your access patterns, and you have reasonably good locality. In many other cases mmap is just a total loss, because the page table walking is often more expensive than even a memcpy().
...
memcpy() often gets a bad name. Yeah, memory is slow, but especially if you copy something you just worked on, you're actually often better off letting the CPU cache do its job, rather than walking page tables and trying to be clever.
Just as an example: copying often means that you don't need nearly as much locking and synchronization - which in turn avoids one whole big mess (yes, the memcpy() will look very hot in profiles, but then doing extra work to avoid the memcpy() will cause spread-out overhead that is a lot worse and harder to think about).
This is why a simple read()/write() loop often beats mmap approaches. And often it's actually better to not even have big buffers (ie the old "avoid system calls by aggregation" approach) because that just blows your cache away.
Right now, the fastest way to copy a file is apparently by doing lots of ~8kB read/write pairs (that data may be slightly stale, but it was true at some point). Never mind the system call overhead - just having the extra buffer stay in the L1 cache and avoiding page faults from mmap is a bigger win.
Besides the fact that this thread is from 21 years ago, it also precedes the Spectre/Meltdown attacks and syscalls have gotten much more expensive since so minimizing syscalls is imperative.
To summarize this post with all the above context, how should I approach desiging the I/O interface for my application, when I don't have the means to profile at this stage? What would be my best bet? And where can I learn the nifty I/O tricks used for such uses?
Share Improve this question asked 10 hours ago vibhav950vibhav950 571 silver badge8 bronze badges 9 | Show 4 more comments1 Answer
Reset to default 5Performance is always best measured, not predicted. So,
how should I approach desiging the I/O interface for my application, when I don't have the means to profile at this stage?
First make it work. Only then focus on making it faster, and that with the support of profiling and performance testing.
You should start with something -- anything -- that does the job and is easy to write, to validate, and to reason about. For instance, straightforward reading in 8kb chunks with read()
. Make this modular so that it is easy to swap it out for anything else you want to test.
It may be that when you get around to performance testing, you discover that your modular I/O interface itself is a significant bottleneck. At this point you have data to guide you as to what to do instead.
At worst, you throw away a whole first iteration, and write a new one in light of the lessons learned. Some people even develop applications with the a priori expectation that they will need to do this.
本文标签: cMost efficient way to handle IO for large filesStack Overflow
版权声明:本文标题:c - Most efficient way to handle IO for large files - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1743747827a2532087.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
write()
and the rest? – chux Commented 10 hours agocompress
it may even make the resulting block larger and will certainly consume considerable computational time. – Martin Brown Commented 9 hours ago