Hacker News new | ask | show | jobs
by haxen 848 days ago
I actually wrote a Rust version as well, and yes, it was far easier to write, far less code (although not incorporating all the tricks), completely safe, and pretty fast -- but still 2x slower than my end result in Java.

HashMap was quite a bottleneck in Rust as well, for many reasons, not just the key allocation problem. But it was very easy to implement the same kind of custom hashtable, which almost by default ends up having a better memory layout than in Java.

https://github.com/mtopolnik/rust-1brc/blob/main/src/main.rs

I bet I could improve on it just a bit and match the Java time. Not sure about topping it, though.

2 comments

> HashMap was quite a bottleneck in Rust as well, for many reasons, not just the key allocation problem.

I'd be curious to hear more. Other than the default hash function being quite expensive, and occasionally wanting to use the more expressive hashbrown API, I've been happy with Rust's std::collections::HashMap.

The reason the custom hashtable wins out isn't something generally applicable. For the very specific dataset used in the challenge, the hash function could be radically simplified, to just a single multiply and rotate left.

To be fair, I didn't try out the hashbrown API. Maybe that, together with FxHash, would have been enough for the official dataset with 97% of keys < 16 bytes. But, I was optimizing for the "10k" dataset as well, with a lot of longer keys, and hashing just the first 8 bytes was the winning idea.

> For the very specific dataset used in the challenge, the hash function could be radically simplified, to just a single multiply and rotate left.

That's just a custom hash function, though; couldn't you do that with the standard std::collections::HashMap?

I didn't bother to try, so not sure. There would probably be some challenges and I don't see how I'd accomplish it without some branch instruction in the custom hasher.
I thought everyone used hashbrown now

https://github.com/rust-lang/hashbrown

They do because the standard library implementation was changed to use Hashbrown. However the standard library API has a slight limitation in that you can't get say "I have a reference to a string. If there's an entry for it, give me that. Otherwise clone the string and insert a new entry. Also only do one key lookup."

You end up either having to do two lookups or always cloning the string even if you ended up not needing it.

You can't. `entry()` takes they key by value, not by reference:

https://doc.rust-lang.org/std/collections/hash_map/struct.Ha...

You might be getting confused because in that example the `HashMap` keys are references.

My bad. That’s not been an issue in my code, as I use arenas to manage memory and never put an owning container as key in a HashMap.