Hacker News new | ask | show | jobs
by eatonphil 319 days ago
One of the areas I wonder about this a lot is when integrating Rust code into Postgres which has its own allocator system. Mostly right now when we need to have complex data structures (non-Postgres data structures) that must live outside of the lexical scope we put them somewhere global and return a handle to the C code to reference the object. But with the upcoming support for passing an allocator to any data structure (in the Rust standard library anyway) I think this gets a lot easier?
3 comments

For me the most interesting thing in Allocator is that it's allowed to say OK, you wanted 185 bytes but I only have a 256 byte allocation here, so, here is 256 bytes.

This means that e.g. a growable container type doesn't have to guess that your allocator probably loves powers of 2 and so it should try growing to 256 bytes not 185 bytes, it can ask for 185 bytes, get 256 and then pass that on to the user. Significant performance is left on the table when everybody is guessing and can't pass on what they know due to ABI limitations.

Rust containers such as Vec are already prepared to do this - for example Vec::reserve_exact does not promise you're getting exactly the capacity you asked for, it won't do the exponential growth trick because (unlike Vec::reserve) you've promised you don't want that, but it would be able to take advantage of a larger capacity provided by the allocator.

There's so much more information that code could give allocators but doesn't due to being stuck with ancient C APIs. Is the allocation likely to be short lived? Is speed or efficiency more important? Is it going to be accessed by multiple threads? Is it likely to grow in future?
That seems suspect to me. If I call reserve_exact I do actually mean reserve_exact and I want .capacity() to return with the argument I passed to reserve_exact(). This is commonly done when using Vec as a fixed capacity buffer and you don't want to add another field to whatever owns it that's semantically equivalent to .capacity().

I don't really care if the memory region is past capacity * size of::<T>(), but I do want to be able to check if .len() == .capacity() and not be surprised

> This is commonly done when using Vec as a fixed capacity buffer and you don't want to add another field to whatever owns it that's semantically equivalent to .capacity().

The documentation for Vec already explains exactly what it's offering you, but lets explore, what exactly is the problem? You've said this is "commonly done" so doubtless you can point at examples for reference.

Suppose a Goose is 40 bytes in size, and we aim to store say 4 of them, for some reason we decide to Vec::new() and then Vec::reserve_exact(..., 4) rather than more naturally (but with the same effect) asking Vec::with_capacity(4) but alas the allocator underpinning our system has 128 or 256 byte blocks to give, 4x40 = 160, too big for 128, so a 256 byte block is allocated and (a hypothetical future) Vec sets capacity to 6 anyway.

Now, what disaster awaits in the common code you're talking about? Capacity is 6 and... there's capacity for 6 entries instead of 4

The condescension isn't appropriate here. I'm talking about using `Vec` as a convenient temporary storage without additional bookkeeping on top if the capacity() is meaningful. Like you said, Rust doesn't guarantee that because `reserve_exact` is not `reserve_exact`. In C++, the pattern is to resize() and shrink_to_fit(), which is implementation defined but when it's defined to do what it says, you can rely on it.

> Now, what disaster awaits in the common code you're talking about? Capacity is 6 and... there's capacity for 6 entries instead of 4

The capacity was expected to be 4 and not 6, which may be a logical error in code that requires it to be. If this wasn't a problem the docs wouldn't call it out as a potential problem.

The condescension you've detected is because I doubt your main premise - that what you've described is "common" and so the defined behaviour will have a significant negative outcome. It's no surprise to me that you can offer no evidence for that premise whatsoever and instead just retreat to insisting you were correct anyway.

The resize + shrink_to_fit incantation sounds to me a lot like one of those "Sprinkle the volatile keyword until it works" ritualistic C++ practices not based in any facts.

> the pattern is to resize() and shrink_to_fit(),

As someone who primarily writes C++ I would not expect that to work. I mean it's great if it does I guess (I don't really see the point?) but that would honestly surprise me.

I would _always_ expect to use >= for capacity comparisons and I don't understand what the downside would be. The entire point of these data structures is that they manage the memory for you. If you need precise control over memory layout then these are the wrong tools for the job.

>But with the upcoming support for passing an allocator to any data structure (in the Rust standard library anyway) I think this gets a lot easier?

Yes and no. Even within libstd, some things require A=GlobalAlloc, eg `std::io::Read::read_to_end(&mut Vec<u8>)` will only accept Vec<u8, GlobalAlloc>. It cannot be changed to work with Vec<u8, A> because that change would make it not dyn-compatible (nee "object-safe").

And as you said it will cut you off from much of the third-party crates ecosystem that also assumes A=GlobalAlloc.

But if the subset of libstd you need supports A=!GlobalAlloc then yes it's helpful.

If the `A` generic parameters were changed to be ?Sized, it would still be possible to make `read_to_end` support custom allocators by changing the signature to `read_to_end(&mut dyn Vec<u8, Allocator>)`

Not sure if that is a breaking change though, it probably is because of a small detail, I'm not a rustc dev.

First of all, `dyn Vec` is impossible. Vec is a concrete type, not a trait. I assume you meant `Vec<u8, dyn Allocator>`.

Second, no a `&mut Vec<u8, A>` is not convertible to `&mut Vec<u8, dyn Allocator>`. This kind of unsized coercion cannot work because it'll require a whole different Vec to be constructed, one which has an `allocator: dyn Allocator` field (which is unsized, and thus makes the Vec unsized) instead of an `allocator: A` field. The unsized coercion you're thinking of is for converting trait object references to unsized trait object references; here we're talking about a field behind a reference.

Sorry, I meant `&Vec<T, dyn Allocator>`.

And no, it is possible. Here is an example that does it with BufReader, which has T: ?Sized and uses it as a field: https://play.rust-lang.org/?version=stable&mode=debug&editio...

Though it comes with a caveat that you can't take self by value, which is perfectly fine for this use case & is what a normal allocator-aware language does anyway.

I stand corrected. I didn't know rustc supported such a coercion automatically. Now I see it is documented in CoerceUnsized + Unsize.

That said, other than the problem of this being a breaking API change for Read::read_to_end, another problem is that Vec's layout is { RawVec, len } and the allocator is inside RawVec, so the allocator is not the last field of Vec, which is required for structs to contain unsized fields. It would require reordering Vec's fields to { len, RawVec } which may not be something libstd wants to do (since it'll make data ptr access have an offset from Vec ptr), or to inline RawVec into Vec as { ptr, cap, len, allocator }.

I’m not sure what those two things have to do with each other, though I did just wake up. The only thing the new allocator stuff would give you is the ability to allocate a standard library data structure with the Postgres allocator. Scoping and handles and such wouldn’t change, and using your own data structures wouldn’t change.

It’s also very possible I’m missing something!

> The only thing the new allocator stuff would give you is the ability to allocate a standard library data structure with the Postgres allocator.

Yeah no this is basically all I'm saying. I'm excited for this.

Ah yeah, well it's gonna be a good feature for sure when it ships!