Hacker News new | ask | show | jobs
by mmoskal 451 days ago
The vec pattern has some advantages though, in particular you can often get away with 32 bit indices (instead of 64 bit pointers), making it a little more cache-friendly. I did it for regex ASTs that are supposed to be hash-consed, so never need to die until the whole matcher dies [0].

A more contrived example is Earley items in parser that point inside a grammar rule (all rules are in flat vec) and back into previous parser state - so I have 2 u32 offsets. If I had pointers, I would be tempted to have a pointer to grammar rule, index inside of it, and pointer to previous state [1], so probably 3x the space.

In both cases, pointers would be easier but slower. Annoying that Rust doesn't really let you make the choice though...

[0] https://github.com/microsoft/derivre/blob/main/src/hashcons.... [1] https://github.com/guidance-ai/llguidance/blob/main/parser/s...

1 comments

By all means. It's a pattern I use all the time, not just in Rust (often getting away with much less than 32-bit indices). You mentioned this and/or alluded to it, but my core complaints are:

1. You don't have much choice in the matter

2. When implementing that strategy, the default (happy path coding) is that you have almost zero choice in when the underlying memory is reclaimed

It's a pattern that doesn't have to be leaky, but especially when you're implementing it in Rust to circumvent borrow-checker limitations, most versions I've seen have been very leaky, not even supporting shrink/resize/reset/... operations to try to at least let the user manually be more careful with leaks.

You should probably use something like https://docs.rs/generational-arena/latest/generational_arena... when you want to do this.

I understand the problem isn’t that the tools exist, it’s that there are Rust users who are not aware of the concept and its evolution over the years, which I’d argue is not a uniquely Rust issue.