Hacker News new | ask | show | jobs
by koolba 3454 days ago
> Siphash is an attempt to solve the problem. It is more than a hash function - it's a crypto PRF function and that gives you more guarantees than dumb hash function. Most importantly it takes two values: a "string to hash" and a "crypto key": siphash(string, crypto_key) --> number.

That sounds suspiciously like an HMAC.

2 comments

The difference is in the detail.

An HMAC gives you something very nice, but it's of its choosing. If you want a 256-entry cache, the security properties of a 256-bit HMAC result is not a good fit for the needs of a cache key.

If you want an infinitely big cache, an HMAC gives you zero collisions, but in this case what you want is an 8-bit result that spreads nicely over the 256 values, doesn't let an attacker fill one bucket, and doesn't let an attacker learn anything about the other users and their cache entries.

It still sounds like an HMAC albeit with a 64-bit output rather than a larger (say 256-bit) output.

From reading up on it a bit more, it is an MAC[1]. The difference between it and a generic HMAC construction (say with HMAC-SHA256) is that it's intended to specifically be an MAC which leads to more efficiency.

> An HMAC gives you something very nice, but it's of its choosing. If you want a 256-entry cache, the security properties of a 256-bit HMAC result is not a good fit for the needs of a cache key.

It'd be a fine fit from a cryptographic perspective. It's just slow relative to something like this which more catered to the specific problem.

> If you want an infinitely big cache, an HMAC gives you zero collisions, but in this case what you want is an 8-bit result that spreads nicely over the 256 values, doesn't let an attacker fill one bucket, and doesn't let an attacker learn anything about the other users and their cache entries.

The result of SipHash is 8-bytes (not bits). You'd still have to truncate or consolidate the bits to reduce it to 1-byte (8-bits).

[1]: https://en.wikipedia.org/wiki/SipHash

> 8-bit result that spreads nicely over the 256 values, doesn't let an attacker fill one bucket, and doesn't let an attacker learn anything about the other users and their cache entries.

Willing to be educated, but won't I get that by just masking off 8 bits off of say SHA-x?

[edit: h(s + secret-bits) & 0xff ]

Chopped HMAC-SHA will have similar properties to SipHash, however it is a lot slower.
Good to know. Is that a general statement or only true for specific subset of languages? Blake2B, for example, is demonstrably faster than MD5 in C, etc., but just barely faster than SHA-1 on the JVM.
All collision-resistant secure cryptographic hash functions (whew) should have this property (BLAKE2, SHA-3, Skein, etc.) (Note: those vulnerable to length extension — e.g. SHA-1, SHA-2 — should be used in HMAC construction for keying.)

BLAKE2 is fast for many use cases, but its block size is 64 (for BLAKE2s) and 128 (for BLAKEb) bytes, so hashing anything shorter than the block size will take the same time as hashing the full block. SipHash was designed for fast hashing of short inputs (its block size is 8 bytes), which is why it's good for hash tables and similar uses.

As for performance in different languages: SipHash is pretty fast in any language that has native 64-bit integers, but is not so fast in those which don't (JavaScript). The same applies to BLAKE2b (but not BLAKE2s).

BLAKE2 is slower in Java than MD5 probably because it doesn't use SIMD instructions there, which give a good boost for it (as designed).

Great stuff, thanks. (Just realized who you are! Like your Go work.)

Not sure about the SIMD angle & Java. Ref. impl. of Blake2B [in C] doesn't use it [last I looked at it]. I ~think it has to do with the endian bias of the algo.

People do that. I've done it myself. But all the math about SHA's security properties is about absence of collisions in the full result, not about relative frequency in a small part of the result.

Maybe SHA-8 works fine. I don't know. I've never seen any real mathematical investigation of that, and that's the point.

A secure cryptographic hash function's collision resistance should be MIN(output_length/2, claimed_security). For example, SHA-256 collision security is 128 bits with 256-bit output, but if you truncate output to 128 bits, collision resistance will be 64 bits.
Isn't a secure hash supposed to have bit independence?
it is a lot like an HMAC. The advantage of something like SipHash is just that it's faster for the same security, not that it's fundamentally different from HMAC.