| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by comex 3336 days ago

The main reason it requires unsafe is for memory layout optimizations. Logically, a hash table is an array of buckets, each of which is either a (hash, key, value) triplet or empty. The naive representation would be Vec<Option<(usize, K, V)>>, but using Option would significantly increase the size of each bucket: it's only a one-byte tag to differentiate between Some and None, but alignment constraints would round the overhead up to at least the alignment of usize, typically 8. Instead, HashMap stores just (usize, K, V), and considers a bucket empty whenever 0 is stored in the hash slot; if 0 ever comes up as an actual hash value, it's changed to a different value before being stored.

To the extent I've described it, that might be possible to accomplish in Rust without unsafe code, using a safe abstraction around NonZero (an unsafe type which the compiler assumes will never be zero, allowing it to automatically use a zero value as a marker if the type is found in an enum, such as Option). However, HashMap goes a step further: it actually stores all the hashes contiguously in one array, and the (K, V) pairs in another. The nth index in the hashes array and in the (K, V) array together represent a single bucket, but storing them separately makes HashMap's typical access patterns somewhat nicer on the CPU's cache. However, this means that slots in the (K, V) array can contain either valid data (of arbitrary types, which might contain pointers etc.) or garbage, depending on a value stored somewhere completely different. There's no good 'automatic' way to make that safe.

I think the use of unsafe code here is fine - it's not that hard to audit, and if you don't audit you're in trouble anyway (hash table DoS is also a security risk) - but in the long term, it would be very interesting if Rust could integrate an optional theorem prover, with the ability to prove arbitrarily complex code safe, given sufficient annotations...