| I wrote almost the exact same thing after seeing Sebastian Aaltonen's original offset allocator repo which inspired me to write my own clone in Rust. It's possible to further improve the fragmentation characteristics if the maximum size of the memory to be allocated is known ahead of time. The original used a 5.3 "floating point" scheme for the free bins, but given the maximum size you can opt for 4.4 or 6.2 or whatever can cover the whole region, giving more dense (or sparse) allocation granularity as needed. This same idea can also be extended to allocating texture atlas regions when the regions have power of two size. 256 bins can cover all possible texture sizes that GPUs can handle, so you can get hard O(1) texture packing. I'm also curious about making something like this work as a general purpose allocator (as alluded to by the README). For that it would be necessary to make a reverse lookup from pointer address to allocation handle, which would require something like a red-black tree covering the address space, which would no longer be 0(1). If anyone has ideas on this front, I would be happy to hear them. This algorithm is so clever and kind of a game changer. Allocating and freeing are fast real time operations (only dozens of CPU cycles and a few memory writes). It kind of makes "bump allocators" obsolete because this allocator is almost as fast but supports trimming and freeing allocations as well. This algorithm requires more memory but the amount is quite low. |
A radix tree can solve this: https://en.wikipedia.org/wiki/Radix_tree
I used one way, way back to do exactly the same thing: upon a free, I needed to look up all of the metadata for an address. For a 32-bit address space, you can just allocate a giant array up front, and use the address as an index. For a 64-bit address space, it's obviously way too big to statically allocate it up front. A radix tree neatly solves the problem. An arbitrarily sized radix tree is not constant lookup, but for reasons I honestly cannot remember, for a 64-bit address space, it's guaranteed to only be 3 levels deep.
See my implementation, which (I believe) I borrowed from tcmalloc: https://github.com/scotts/streamflow/blob/master/streamflow....