Hacker News new | ask | show | jobs
by exDM69 782 days ago
> Aw. Why not?

Because the global allocator API is defined as free(pointer address) and this used free(buffer handle).

It would require a reverse lookup structure from address to buffer handle, e.g. red-black tree. Maintaining it would no longer be O(1).

> the fact that the number of bits is constant doesn't make this O(1), it just makes it O(n) where n := 64

O(n) when n is constant is equal to O(1).

But lzcnt uses a fixed number of clock cycles so it's not really O(n) either.

4 comments

GlobalAlloc is also required to be thread safe, and TLSF doesn't even attempt to handle that. I suppose you could get away with it on single-threaded platforms like microcontrollers or (vanilla) WebAssembly but it wouldn't generalize unless you wrapped it in a mutex, and then the performance would be terrible.
Correct me if I'm wrong but can't you put a Rc<RefCell<OffsetAllocator>> in your std::alloc::Allocator implementation for interior mutability?

This make a non-thread safe allocator, with the (desired) side effect of making anything allocating from it (e.g. Vec) a !Send.

Or is there a requirement that Allocators must be Sync and Send?

This is about GlobalAlloc, the trait that actually exists in stable Rust. Only static variables can be marked as the #[global_allocator], and statics are always required to be Send + Sync. Of course, if you're really sure that only a single thread will ever exist, you can pinky-promise it by putting a couple unsafe impls on the allocator type. But then the language won't protect you if you do try to allocate or deallocate from multiple threads.

(More practically, you can just write a GlobalAlloc that forwards to an instance of a thread-local allocator. Though that runs the risk of thread-local deinitialization running in an unexpected order.)

Any uc worth programming on has interrupts!
Even with interrupts there is almost always a single execution unit, and thus single-threaded. An interrupt is just a jump instruction to a predefined handler. The code that was running is entirely stopped until the system decides to resume from that point.
That doesn't mean that naive single-threaded code is necessarily interrupt-safe. If the allocator is interrupted and the interrupt service routine needs to make use of the allocator's data structures for any reason there could very much be a conflict.

This particular algorithm is focused on GPUs, so I'm not clear on the applicability. However I did want to clear up the idea that there are no concurrency concerns for single-threaded library code, an assumption that's been the source of many bugs.

The presence of interrupts is basically equivalent to not being single threaded, though. There are true single-threaded environments, it's just limited to userspace processes that don't handle signals.
It is even worse: re-entrancy safe is a different property than thread-safe. For example you can't use mutexes to guarantee mutual exclusions with interrupts.

Some algorithms are both thread safe and reentrancy safe, but generally this is not the case.

> It would require a reverse lookup structure from address to buffer handle, e.g. red-black tree. Maintaining it would no longer be O(1).

Well, another solution is enlarging each allocation by 8 bytes and storing the buffer handle in front of the returned allocation. So `free(ptr)` would get the handle as `*((ptr as *const u64) - 1)`.

That works but...

This kind of allocators are usually used for suballocating GPU buffers, so hiding a few bytes of metadata "in-band" can mess up your alignment requirements and not all kinds of GPU memory are even accessible by CPU. Due to false sharing cache problems you would probably want a full cache line (64 bytes) to store the metadata.

For a CPU-only memory allocator your idea could work quite well. It can also be implemented on top of this code without any modifications.

OTOH, I think this would be a good fit for the "Store" proposal [1] which uses handles rather than addresses to refer to allocations.

[1] https://github.com/matthieu-m/storage/blob/main/etc/rfc.md

> It would require a reverse lookup structure from address to buffer handle, e.g. red-black tree. Maintaining it would no longer be O(1).

Not necessarily. If you are able to map a block of normal memory in a known location relative to the GPU memory that you are managing with an "offset allocator", it should be possible to directly calculate the metadata location for each "offset". This is how most allocators find arenas/buckets (whatever you want to call them) for an allocation by a pointer.

Something like this:

  +-------------------------+ 0x000000 (start of managed memory)
  | Metadata                | 
  |                         |
  |                         |
  | ...                     |
  +-------------------------+ 0x001000 
  | Padding ...             |
  +-------------------------+ 0x010000
  | GPU Memory Block        |
  |                         |
  |                         |
  |                         | ~2MB block
  |                         |
  |                         |
  |                         |
  +-------------------------+ 0x210000
With this layout, to get to a metadata for an allocation, all you need to do is to align down the allocation pointer and calculate the appropriate location in the metadata page.

This obviously won't work in all scenarios, but it's a simple and practical way around a map lookup.

A good idea but not O(1).

If you have a large allocation you have to mark every "page" in metadata table which makes allocation O(size in bytes / page size).

The overhead might still be practically acceptable, even if it is not constant time.