I always had the intuition that allocating and initializing the memory to 0 in a single step could be faster than doing it yourself even if modern C compilers replace the memset() calls with inline assembly instructions. By browsing the glibc malloc source code for another problem, I had the perfect opportunity to validate my intuition and it turns out that my intuition was correct!
On Linux, the glibc heap manager is using the sbrk() system call to grow the heap. The fresh memory returned by sbrk() is initialized to, guess which value??, 0. glibc heap manager keeps track of memory in its heap that is freshly returned by sbrk() and calloc() can leverage this information to return memory already containing zeros and skipping totally the memset() step!
I did benchmark what I was writing and I did find out that, with glibc 2.18, it is true only for allocation size > ~ 64KB. Smaller than that malloc + memset is faster. I have reported this finding to the glibc mailing list. I still advocate to prefer calloc() as the current result is probably temporary and I expect future version to remove this anomaly.
for those desiring to play with the test, you can get these files:
to build do:
gcc -O3 -c calloc_emul.c
Play with the various define values and
gcc -O3 -c tst-calloc.c
gcc -o tst-calloc tst-calloc.o calloc_emul.o
When I start an application with glc-capture, the application become frozen and unresponsive.
It even ignores kill -11 signals!
With gdb, I have found that glc blocks inside a dlopen() call.
#0 0xf76ff430 in __kernel_vsyscall ()
#1 0xf7591231 in __lll_lock_wait_private () from /usr/lib32/libc.so.6
#2 0xf750fd8a in _L_lock_6856 () from /usr/lib32/libc.so.6
#3 0xf750d77d in malloc () from /usr/lib32/libc.so.6
#4 0xf770d05b in _dl_map_object_deps () from /lib/ld-linux.so.2
#5 0xf7712f8b in dl_open_worker () from /lib/ld-linux.so.2
#6 0xf770ee1a in _dl_catch_error () from /lib/ld-linux.so.2
#7 0xf7712954 in _dl_open () from /lib/ld-linux.so.2
#8 0xf75bb67b in do_dlopen () from /usr/lib32/libc.so.6
#9 0xf770ee1a in _dl_catch_error () from /lib/ld-linux.so.2
#10 0xf75bb76b in dlerror_run () from /usr/lib32/libc.so.6
#11 0xf75bb7f1 in __libc_dlopen_mode () from /usr/lib32/libc.so.6
#12 0xf7591ad8 in init () from /usr/lib32/libc.so.6
#13 0xf769c0ee in pthread_once () from /usr/lib32/libpthread.so.0
#14 0xf7591d45 in backtrace () from /usr/lib32/libc.so.6
#15 0xf74adc63 in backtrace_and_maps () from /usr/lib32/libc.so.6
#16 0xf7504263 in __libc_message () from /usr/lib32/libc.so.6
#17 0xf750a3ca in malloc_printerr () from /usr/lib32/libc.so.6
#18 0xf750be11 in _int_malloc () from /usr/lib32/libc.so.6
#19 0xf750d788 in malloc () from /usr/lib32/libc.so.6
#20 0xf770d05b in _dl_map_object_deps () from /lib/ld-linux.so.2
#21 0xf7712f8b in dl_open_worker () from /lib/ld-linux.so.2
#22 0xf770ee1a in _dl_catch_error () from /lib/ld-linux.so.2
#23 0xf7712954 in _dl_open () from /lib/ld-linux.so.2
#24 0xf768bcbc in ?? () from /usr/lib32/libdl.so.2
#25 0xf770ee1a in _dl_catch_error () from /lib/ld-linux.so.2
#26 0xf768c37c in ?? () from /usr/lib32/libdl.so.2
#27 0xf768bd71 in dlopen () from /usr/lib32/libdl.so.2
#28 0xf76c6077 in get_real_alsa () at /home/lano1106/dev/lib32-glc/src/glc/src/hook/alsa.c:256
#29 0xf76c64d9 in alsa_init (glc=glc@entry=0xf76cd280 <mpriv>) at /home/lano1106/dev/lib32-glc/src/glc/src/hook/alsa.c:96
#30 0xf76c29ab in init_glc () at /home/lano1106/dev/lib32-glc/src/glc/src/hook/main.c:114
#31 0xf76c56f5 in __opengl_glXGetProcAddressARB (proc_name=0x928a8c2 "glDeleteFencesNV") at /home/lano1106/dev/lib32-glc/src/glc/src/hook/opengl.c:332
#32 0xf76ee80c in glXGetProcAddressARB () from /home/lano1106/.local/share/Steam/ubuntu12_32/gameoverlayrenderer.so
I was suspicious about malloc_printerr() call so I went to glibc malloc source code. BTW, I have found something else interesting to share in my next post concerning glibc heap management. My suspicion was right. When you see malloc_printerr() in a stack, it is pretty much game over as it means that malloc() has detected memory corruption in the heap.
Since it is happening even before main() invoked when shared libraries are loaded, even __libc_message() needs something to be loaded by the dynamic loader which will reenter malloc and auto self deadlock itself!
Fortunately, I am entering this state only when using a particular glc command line option, hence it should not be too hard to find the offending code.
Update: I have found the bug!
I was looking to screen capture opengl gaming sessions on Linux. On my first attempt, I have tried to use ffmpeg x11grab. This initial attempt failed since all recent OpenGL graphic cards drivers support direct rendering. This means OpenGL rendering simply bypass communicating with the xserver and makes x11grab useless. x11grab will grab anything on the screen that is behing the fullscreen foreground opengl window that use direct rendering.
I have then found glc that even if there is no recent activity on its github page, is very effective in what it is doing. It has not a lot of documentation on it but you can find a basic usage wiki page on it.
How does glc works?
It somehow place itself between the application to capture and the ALSA and OpenGL APIs with the help of a library called elfhacks, intercept the raw data and store it in a glc file that can become huge. The fact that it only support only very minimal realtime compression is one of the weakness of the tool. To give you an idea on how big the file can become, here is a quick calculation for a 1920x1080 resolution: 1920x1080 is about 2 millions pixels. Each pixel taking 4 bytes, it gives about 8MB per frame. At 30 fps, it is little bit less than 250MB per second! If you apply compression, I think that I have seen disk space consumption close to 1GB/minute. I have the plan to modify glc to pipe the captured raw data to ffmpeg for possibly improve a little bit the compression.
I intend to write a couple of posts on glc because there is not a lot written on it and hopefully it will help many to start using it so I'll cover things that I had to set up to be able to use it. The first thing will be to clarify glc audio capture options. I was under the impression that I had to use -a option and to provide an ALSA capture device to capture the application pcm streams. glc can record several pcm streams.
First it places ALSA hooks to intercept the application calls to ALSA to record the applications pcm stream. This is the default behavior and nothing needs to be added to the command line to record this audio. You can disable this recording with the option --disable-audio.
Secondly, you can add additionnal streams by using the -a option.
There exist a third option that I have developped due to my misinterpretation of how glc capture audio. You can disable glc ALSA hook recording and instead use ALSA snd_aloop driver fed by a pcm plug that split the stream between the sound card and the loopback device. This can be handy if you have surround 7.1 system with sampling rate of 192kHz and you want to downsample the stream and reduce the number of recorded channels. Even if this is not something you are confronted with, I will try to make it useful as an ALSA tutorial. ALSA lib doxygen output documents well the API but IMHO, it lacks some good examples to help understanding how to use ALSA.
On the GL side, frames can be acquired by 2 different methods: glReadPixels or GL_ARB_pixel_buffer_object (PBOs). You can use the switch --pbo to indicate to glc to try to use PBOs for the frame acquisition. PBOs have been introduced after glReadPixels. Their usage can provide a performance hedge over the former method for reasons that escape my undertanding. I will try to benchmark both methods to be able to compare them.
I want you to find in this blog informations about C++ programming that I had a hard time to find in the first place on the web.
|<< <||Current||> >>|