Hacker News new | ask | show | jobs
by 110jawefopiwa 540 days ago
I think the main problem is that Rust doesn't allow for basic data structures like graphs or doubly-linked lists without extra effort, which is why I've generally always preferred pseudocode.
3 comments

I often hear this and am confused; not only are things like 'object soup'[0] possible and straightforward (putting things in collections and referring to them by indices), I never concretely hear why a graph or doubly-linked list becomes uniquely difficult to implement in Rust (and would genuinely be curious to learn why you feel this way). If you needed such data structures anyway, they're either in the standard library or in the many libraries ('crates' in Rust-lingo) available on Rust's package registry[1]---using dependencies in Rust is very straightforward & easy.

[0]: https://jacko.io/object_soup.html

[1]: https://crates.io/

The ownership and model makes working with pointers a lot more complicated, which is the core of a good graph or doubly-linked list datatype.

> Learn Rust With Entirely Too Many Linked Lists

> In this series I will teach you basic and advanced Rust programming entirely by having you implement 6 linked lists. In doing so, you should learn:

* The following pointer types: &, &mut, Box, Rc, Arc, *const, *mut, NonNull(?)

* Ownership, borrowing, inherited mutability, interior mutability, Copy

* All The Keywords: struct, enum, fn, pub, impl, use, ...

* Pattern matching, generics, destructors

* Testing, installing new toolchains, using miri

* Unsafe Rust: raw pointers, aliasing, stacked borrows, UnsafeCell, variance

https://rust-unofficial.github.io/too-many-lists/

Object soup is like reinventing the heap, badly.
I have two different reactions to this:

- From the perspective of e.g. C++, I don't think it's reinventing the heap. We often prefer to use indexes into a std::vector or similar, even when putting everything in std::shared_ptr is an option, because the dense layout of the vector gives us the blazing fast cache performance we crave. It also avoids memory leaks from reference cycles.

- From the perspective of e.g. Python, yeah it's reinventing the heap. Unless you need to serialize the whole collection into JSON or something, it's much less convenient. But (almost) no one uses Rust instead of Python just for the convenience. (There are dozens of us! Dozens!)

One of the reasons to reach for a low level language is to have control over tradeoffs specific to your use case instead of using some pre-packaged option. Data structures are a very good example. Generic data structures like those in the standard library will likely suffice for many use cases, but a low level language must give the user the control to write their own. It’s just table stakes for this type of language.
Thankfully, rust does give you that level of control, you can write your own doubly-linked list implementation if you want, and you can pick your safety-overhead tradeoff.
Not sure about Rust std. But on cpp I heard a lot low-level companies avoid using its standard library since it is too bloated and optimized for too many cases.
Well, one can always emulate that in Rust with array indexes. And it will be closer how the actual CPU works where a pointer is just an offset into a memory chunk.

Perhaps this is the reason performance oriented articles prefer Rust. In allows to skip a lot of memory allocation while still catching memory-safety bugs, which is a hard problem with C++.

How could something to implement a pointer be more straightforward than a pointer? People seem to like Rust in excessive, sometimes botheringly excessive proportions.
I don't think the indexing strategy is more straightforward in general. It's a thing you have to learn, and the fact that you have to learn it is part of Rust's famously steep learning curve. That said, there are some common cases where you might wish you'd used indexes instead of pointers in other languages too:

- Working with resizeable collections like std::vector in any non-GC language, where resizing invalidates pointers.

- Serialization. If your objects contain indexes to each other, it's easy to turn them into e.g. JSON. But if they contain pointers to each other, it's tricky.

How are you going to garbage collect inside your array? Sounds like you are just evading the problem and reinventing the heap.
Not easily. But not all use cases require it. The regex crate makes heavy use of this pattern for finite machine states, for example. But this fits nicely with an arena allocation pattern, so everything can just be dropped at once.

Despite this, it's one of the fastest regex engines around: https://github.com/BurntSushi/rebar#summary-of-search-time-b...

Here's a related but more isolated analysis: https://github.com/burntsushi/rsc-regexp

You could probably make an even faster regex by JIT-compiling the resulting automaton.
The benchmarks I linked include multiple regex engines with JITs.
I'm not sure whether this is exactly what you mean, but when you need to support deletions you can switch to a "generational" collection like Slab or SlotMap. Your indexes are still integers, but under the covers some of the bits get repurposed to disambiguate new elements that reuse old slots.
It doesn't though.

Those are fairly straightforward to implement if you are ok using a reference counted type (along with weak references), although that will hurt performance. That would be similar to using a shared_pointer in c++.

And you can implement code similar to what you would in c or c++ if you use raw pointers and unsafe code.

Where you run into difficulties is if you want something that is fast and safe. And then you need a more complicated design that probably involves using indices into a Vec instead of pointers.

Using a Vec for doubly linked lists sounds like overkill. There's got to be more efficient and more elegant way.
Using a Vec actually has some benefits beyond the safety guarantees from rust. There are fewer allocations, and you probably are more likely to get nearby nodes on the same cache line.
Yes, there is, just use plain old pointers. Who cares about UB anyway.