|
|
|
|
|
by 8dcc
528 days ago
|
|
Hello, I am the author. Thank you all for the instructive comments, I made some changes to the article since I first posted it: - Added a link to this HN post. - Renamed some variables and types in the code for readability. - Mention (in a footnote) that we could allocate the 'Pool' structure and the chunk arrays with a single call, or even return the 'Pool' on the stack. Suggested by 'liontwist' and 'unwind'. - Mention (in a footnote) that we don't always need to store the full address when building the linked list of free chunks. Suggested by 'cb321'. - Added valgrind support (also to my 'libpool' project). Suggested by 'kazinator'. |
|
This can really add up. For example, like 20 years ago, I did a `print sizeof` in gdb for some g++ STL map node with a data payload of like 4 bytes and it was some obscene size approximately >= 1 order of magnitude bigger than necessary (like 80 bytes instead of, say 8 with 2Byte ptrs). While main memory is cheap, CPU cache memory is not and for "linky" structures, it often pays to be more cache-friendly. It's pretty easy to imagine a 10X smaller thing fitting into an L2 CPU cache where the larger thing would not even fit in L3, netting you a big latency boost (though the more root-ward levels of the tree likely always stay in caches in a hot loop).
EDIT: Anyway, you don't have to believe me. Others did an x32 ABI for x86_64 mostly to have "merely 2X" smaller pointers: https://en.wikipedia.org/wiki/X32_ABI
[1] https://news.ycombinator.com/item?id=42643410