Hacker News new | ask | show | jobs
by selljamhere 2935 days ago
>>New hashing algorithm

>>Hash values vary from run to run (so don’t depend on hash values or order)

I must be reading this wrong. One of the requirements for a good hash function is determinism - specifically that any given value in the input space maps to exactly one value in the output space.

5 comments

The hasher in the standard library is defined here: https://github.com/apple/swift/blob/master/stdlib/public/cor...

with some explanation for where the seed comes from here:

https://github.com/apple/swift/blob/master/stdlib/public/cor...

Within that code you will see references to Core which by default is this:

https://github.com/apple/swift/blob/82226642c2459c0f5d2054fe...

So you can see the hash function itself is deterministic given the seed it is initialized with. And the seed is generated at process start time during static initialization in C++ so it is effectively a constant for the lifetime of the process.

You can see in the code that there is a way to make hashing fully deterministic by making the seed generation not use random numbers, but it is ill-advised for a variety of security reasons.

The hashing algorithm can be deterministic however hashing relies on a seed. In most cases you want that to be random to prevent certain side channel attacks. For swift, hashing will be deterministic for the lifetime of the program.
That makes sense. I was thinking in terms of global lookup keys, which would require hashes to persist across runs. When thinking in terms of object equality, the hashes need only live as long as the objects themselves. Namely, for the lifetime of the program.
wait, hashing relies on a seed?! isn't that a salt/pepper?

to be "deterministic for the lifetime of the program" sounds like a pepper is being applied, not that the hash algorithm is different.

This is a more accurate description. The hashing algorithm is 100% deterministic, but the inputs include a random seed from the start of the program to prevent various side channel attacks and to prevent folks from erroneously relying on hashmap ordering.
I think the intended reading of ”so don’t depend on hash values or order” is “do not have your code depend on hash values or iteration order of items sorted by hash value”.

So, what you shouldn’t do is:

- storing a hash value on disk, assuming it to be valid in a later run.

- assume that a hash map that has the same content as a hash map of a previous run has the same iteration order (not even if the calls used to construct them are 100% identical, and not even if both were constructed from the same literal).

I expect this was added to thwart DoS attacks (http://ocert.org/advisories/ocert-2011-003.html)

Note that you can disable this feature, however in most cases it's probably best not to -- the new hasher functionality is actually really cool and a good improvement on the old behavior. If you want to disable it, I think it's a compiler flag.

[* said as someone who's implemented probably a ton of poor hash functions in their time]

I believe the compiler flag just disables the random seed - you can still use the new hasher API
During a particular run of the app, this is still true. However, the hash function uses a seed, which gets changed every time you run the app.