I came across an interesting Microsoft Support article on heap performance counters. Apparently there is a registry setting that enables heap counters on Perfmon. This allows users to profile various aspect of heaps in a process.
Perfmon.exe displays these counters when the following registry key is set:
One of the counter that caught my attention is Heap Lock Contention, which is the number of collisions per sec on the heap lock. I learned of heap contention awhile ago from Windows via C/C++, but I have never been able measure it.
In 2009, I wrote some test code to benchmark Low Fragmentation Heap (LFH). Recall that the original test is single-threaded program that randomly allocates and deallocates various size buffers a number of times.
With minor touch-ups, I customized the test code to run with two threads in parallel. So I kicked off the modified test and added a Heap Lock Contention counter on the main process heap.
The lock contention counter gathered some very interesting results. The test program with default allocator generated about 15 collision per second on the heap lock.
I re-ran the test program to use LFH allocator (switchable through a command line argument). The LFH allocator results in 50% less contention compare to the default allocator in Window XP.
I could not get this counter to work properly under Window 7. Microsoft mentioned that only Windows Server 2003, Windows Vista, and Windows Server 2008 are enhanced.
If heap lock contention is a problem, Windows via C/C++ recommends to create a separate heap for allocation intensive classes with a custom new/delete operator.
LFH outperforms the default allocator under Window XP. The heap contention counter confirms my original test result in 2009.
Tools: Visual Studio 2008 (VC9), Boost 1.45, Window XP SP3 (32 bit)
Recently, I read a MSDN article that describes Low-Fragmentation Heap (LFH).
Applications that benefit most from the LFH are multi-threaded applications that allocate memory frequently and use a variety of allocation sizes under 16 KB. However, not all applications benefit from the LFH. To assess the effects of enabling the LFH in your application, use performance profiling data. … To enable the LFH for a heap, use the GetProcessHeap function to obtain a handle to the default heap of the calling process, or use the handle to a private heap created by the HeapCreate function. Then call the HeapSetInformation function with the handle.
Alright, that sounds great, but what does LFH really improve, when do these improvements kick in and what are the side effects? I found some related articles on the internet, but they don’t really answer my questions. I guess it is time to do some experiment.
Since LFH addresses heap fragmentation, the first task obviously is to create a scenario where the heap is fragmented. Heap fragmentation occurs when lots of memory are allocated and deallocated frequently in different sizes. So I wrote a test program to do the following:
The program runs in many iterations.
At each iteration, it randomly allocates or deallocates one chunk of memory.
The size of the memory chunk allocated is randomly chosen from a list of 169, 251, 577, 1009, 4127, 19139, 49069, 499033 and 999113 bytes. I chose prime numbers for fun.
Okay, I lied about item #3. It is not truly random. There will only be fixed number of each memory type, and the total number of chunks allocated will be fixed. Otherwise my computer could run out of memory.
I ran the program with the default allocator and LFH. Here’s the result from the test program.
Memory overhead is the difference between the memory the program would like to allocated and the memory the OS actually allocated. In theory, heap fragmentation can cause the heap to grow larger than it needs to be. The first graph shows that in the earlier iterations, LFH utilizes more memory up front, but after 25600000 iterations, the heap is probably fragmented enough that the memory overhead increases significantly for the default allocator.
The second graph shows the number of page faults occurred. LFH seems to generate far less page faults than the default allocation policy. To be honest, I am not sure if this is a bad thing since the page fault could be soft page faults (minor fault).
The third graph shows the speed between LFH and the default allocation policy. LFH is consistently faster than the default allocation policy in the number of allocation and deallocation performed per second. As the number of iteration increases, there are significant performance degradation from the default allocator.
There are little doubt that the performance of LFH is superior than the default allocation policy in the test program. But whether to not to enable LFH should be determined case by case. Programs that only runs for a short period of time will use more memory in LFH, and will not have much to gain.
The test program run in a single thread. According to the MSDN documentation, multi-threaded program can be benefited by the LFH. So this analysis is not complete. I will update it when I have more time.
[Update 2011/03/22: 18 months later, I finally got around that test this under a multi-threaded program. See Heap Performance Counter for the result.]
The source and the spreadsheet can be downloaded here.
Compiler: Visual Studio 2008
Machine Specification: Core Duo T2300 1.66 GHz with 2GB of RAM.