Hacker News new | ask | show | jobs
by dchest 3453 days ago
Parallel versions of BLAKE2 are indeed even faster, but optimized implementations of non-parallel BLAKE2s and BLAKE2b also use SIMD instructions in compression function (e.g. https://github.com/BLAKE2/BLAKE2/blob/master/sse/blake2b-rou...).

* * *

Not byte swapping, just simple reading bytes and shifting them into the correct position in uint32 like this:

   uint32 result = (b[3] << 24) | (b[2] << 16)  | (b[1] <<  8) | b[0];
Which is slower than

   uint32 result = *b;
or if LE is not native:

   uint32 result = swap(*b);
1 comments

> optimized implementations of non-parallel BLAKE2s and BLAKE2b also use SIMD

Sure, have seen those. No biggie, but my point was that vanila B2b (on C) is already neck and neck with MD5.

> uint32 result = (b[3] << 24) | (b[2] << 16) | (b[1] << 8) | b[0];

Yep, akaik that's all you can do on the JVM & of course for 64bit it's even more work. I thought you meant some sort of magic number bit wizzardry :)