| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by eternalban 3453 days ago
	Good to know. Is that a general statement or only true for specific subset of languages? Blake2B, for example, is demonstrably faster than MD5 in C, etc., but just barely faster than SHA-1 on the JVM.

1 comments

dchest 3453 days ago

All collision-resistant secure cryptographic hash functions (whew) should have this property (BLAKE2, SHA-3, Skein, etc.) (Note: those vulnerable to length extension — e.g. SHA-1, SHA-2 — should be used in HMAC construction for keying.)

BLAKE2 is fast for many use cases, but its block size is 64 (for BLAKE2s) and 128 (for BLAKEb) bytes, so hashing anything shorter than the block size will take the same time as hashing the full block. SipHash was designed for fast hashing of short inputs (its block size is 8 bytes), which is why it's good for hash tables and similar uses.

As for performance in different languages: SipHash is pretty fast in any language that has native 64-bit integers, but is not so fast in those which don't (JavaScript). The same applies to BLAKE2b (but not BLAKE2s).

BLAKE2 is slower in Java than MD5 probably because it doesn't use SIMD instructions there, which give a good boost for it (as designed).

link

eternalban 3453 days ago

Great stuff, thanks. (Just realized who you are! Like your Go work.)

Not sure about the SIMD angle & Java. Ref. impl. of Blake2B [in C] doesn't use it [last I looked at it]. I ~think it has to do with the endian bias of the algo.

link

dchest 3453 days ago

(Thanks!:)

There are a few optimized implementation in the BLAKE2 code package — https://github.com/BLAKE2/BLAKE2. "Ref" indeed doesn't use SIMD, but other implementations do use them. I think the claim that it's faster than MD5 is based on benchmarking SIMD implementations, and if I remember correctly, the reference one is a bit slower.

Yes, endianness conversion — that is, just reading uint64 or uint32 from byte arrays — if done in the most simple way by bit shifting can be a factor. Another factor that influenced performance of pure Go implementation (should be similar to Java) is bounds checking.

link

eternalban 3453 days ago

"BLAKE2 comes in two main flavors: BLAKE2b is optimized for 64-bit platforms, and BLAKE2s for smaller architectures. On 64-bit platforms, BLAKE2 is often faster than MD5, yet provides security similar to that of SHA-3. We specify parallel versions BLAKE2bp and BLAKE2sp that are up to 4 and 8 times faster, by taking advantage of SIMD and/or multiple cores."

Well, it's supposed to beat MD5 straight up. Granted the claim is a bit hedged.

> if done in the most simple way by bit shifting can be a factor

Do you mean using byte swap instructions or is there an algorithm that I should know about? :)

link

dchest 3453 days ago

Parallel versions of BLAKE2 are indeed even faster, but optimized implementations of non-parallel BLAKE2s and BLAKE2b also use SIMD instructions in compression function (e.g. https://github.com/BLAKE2/BLAKE2/blob/master/sse/blake2b-rou...).

* * *

Not byte swapping, just simple reading bytes and shifting them into the correct position in uint32 like this:

   uint32 result = (b[3] << 24) | (b[2] << 16)  | (b[1] <<  8) | b[0];

Which is slower than

   uint32 result = *b;

or if LE is not native:

   uint32 result = swap(*b);

link

eternalban 3453 days ago

> optimized implementations of non-parallel BLAKE2s and BLAKE2b also use SIMD

Sure, have seen those. No biggie, but my point was that vanila B2b (on C) is already neck and neck with MD5.

> uint32 result = (b[3] << 24) | (b[2] << 16) | (b[1] << 8) | b[0];

Yep, akaik that's all you can do on the JVM & of course for 64bit it's even more work. I thought you meant some sort of magic number bit wizzardry :)

link