I saw a very entertaining youtube video where the host has tested 10 Gbits Ethernet cards on a LAN to perform file transfers. He somehow could not reach the maximum theoritical transfer speed. Here is my explanation why and it is probably that simple:
I do not know what was the protocol underneath for the transfer but if it is TCP, there is something called the bandwidth-delay product that says that in order to keep the network pipe full, you need sufficient TCP send and receive buffers.
Let me show you how to compute that. RTT is basically given by ping. Lets assume worse case scenario with a RTT of 0.2 mSec. 10Gbit/sec is 1250MB/sec or 1,250,000,000 B/sec.
1,250,000,000 B/sec * 0.0002 sec = 250,000B or 250KB. Last time I have checked (I'm a Linux guy) default Windows TCP buffers size were something around 16KB or perhaps 64KB. These are the parameters that you would need to play with to get close to the 10GB/sec rates.
You can change the default value through the registry or on a per connection basis with the socket API.
If we take your maximum transfer rate in your video which was roughly about 360MB/sec and plug it in the bandwidth delay product formula, I get 72KB which is very close to 64KB. Pretty sure that this is your problem!
My first book is now available:
I always had the intuition that allocating and initializing the memory to 0 in a single step could be faster than doing it yourself even if modern C compilers replace the memset() calls with inline assembly instructions. By browsing the glibc malloc source code for another problem, I had the perfect opportunity to validate my intuition and it turns out that my intuition was correct!
On Linux, the glibc heap manager is using the sbrk() system call to grow the heap. The fresh memory returned by sbrk() is initialized to, guess which value??, 0. glibc heap manager keeps track of memory in its heap that is freshly returned by sbrk() and calloc() can leverage this information to return memory already containing zeros and skipping totally the memset() step!
I did benchmark what I was writing and I did find out that, with glibc 2.18, it is true only for allocation size > ~ 64KB. Smaller than that malloc + memset is faster. I have reported this finding to the glibc mailing list. I still advocate to prefer calloc() as the current result is probably temporary and I expect future version to remove this anomaly.
for those desiring to play with the test, you can get these files:
to build do:
gcc -O3 -c calloc_emul.c
Play with the various define values and
gcc -O3 -c tst-calloc.c
gcc -o tst-calloc tst-calloc.o calloc_emul.o
:: Next Page >>
I want you to find in this blog informations about C++ programming that I had a hard time to find in the first place on the web.
| Next >
|<< <||> >>|