|
|
|
|
|
by dchest
3453 days ago
|
|
Parallel versions of BLAKE2 are indeed even faster, but optimized implementations of non-parallel BLAKE2s and BLAKE2b also use SIMD instructions in compression function (e.g. https://github.com/BLAKE2/BLAKE2/blob/master/sse/blake2b-rou...). * * * Not byte swapping, just simple reading bytes and shifting them into the correct position in uint32 like this: uint32 result = (b[3] << 24) | (b[2] << 16) | (b[1] << 8) | b[0];
Which is slower than uint32 result = *b;
or if LE is not native: uint32 result = swap(*b);
|
|
Sure, have seen those. No biggie, but my point was that vanila B2b (on C) is already neck and neck with MD5.
> uint32 result = (b[3] << 24) | (b[2] << 16) | (b[1] << 8) | b[0];
Yep, akaik that's all you can do on the JVM & of course for 64bit it's even more work. I thought you meant some sort of magic number bit wizzardry :)