Hacker News new | ask | show | jobs
by leo_bloom 1786 days ago
If I had a BLAKE3 implementation available in the programming language of choice, is there any reason to still prefer the SHA family over it (for integrity checks, not for password hashing, as mentioned in the readme)?
5 comments

Aside from the various technical reasons others have given, I would like to say please have a look at the underlying design of SHA-3 - it’s really elegant, with so many applications beyond just hash functions. Ironically, I feel like SHA-3 should obsolete block ciphers like AES more than it obsoletes SHA-2.

https://keccak.team/sponge_duplex.html

> look at the underlying design of SHA-3 - it’s really elegant

Yes, I imlemented a whole pile of hash functions, and I agree wholeheartedly. Whereas md5/sha seem to be have been designed by pouring a hodgepodge of complexity into a algorithm until something indecipherable turned up sha3 is simple. It's just a small number of easily understood operations, each with a clear purpose.

Actually, it looked to me like it's been an evolution. md5 is insanely complex and the sha2 family got simpler, then then we get t sha3.

Symmetric algorithms look to be going the same way. DES is insanely complex, AES less so, and Speck in almost unbelievably simple (look at the source code on Wikipedia https://en.wikipedia.org/wiki/Speck_(cipher)). It seems to be an unfashionable viewpoint, but in my mind that simplicity makes Speck seem more worthy of trust that a lot of it's rivals.

Mind you,

SHA-3 is extremely slow compared to common ciphers like AES and ChaCha20. Sponge functions might someday become the building blocks of symmetric ciphers, but it's unlikely that SHA-3 will (without hardware acceleration).
For historical reasons the SHA-3 standard made extremely conservative choices with its security parameters, particularly the number of rounds. The result is that SHA-3 is slower than SHA-2 in a lot of cases, but it didn't have to be that way. The same team of cryptographers published the KangarooTwelve hash in 2016, with half the number of rounds. I think that implies that SHA-3 could've been twice as fast with no loss in security. KangarooTwelve also introduces a tree structure, which enables a lot of the same optimizations that you see in BLAKE3, and the two designs are interesting to compare. (See section 7.6 of the BLAKE3 paper.)
Well SHA-3 is a hash function, and indeed somewhat slow in software. But the team have since enormously expanded the primitives based on the same core design, with much better performance: https://keccak.team/sw_performance.html

You can also look at things like the Strobe framework, which builds essentially all of its symmetric crypto out of the SHA-3 core permutation: https://strobe.sourceforge.io/

BLAKE3 has a large state, it's a tree hash, so you need to keep the tree of hashes, which is proportional to logarithm of the message size. SHA has fixed state for any message.
Huh, I didn't know that.

https://github.com/BLAKE3-team/BLAKE3/blob/b404c851c284ed01f...

I peeked at the reference impl (380 lines of safe Rust) and found this. It has a 1,728-byte array for tree state, which is enough for 2 ^ 64 bytes.

So in practice it's also fixed.

Federal government (FIPS) compliance.
Security-wise they are roughly equivalent. While SHA has had more eyes on it I doubt either construction will ever be practically broken at hash sizes like 384 or 512 bits. Someone may find an "academic break" at some point.

BLAKE3 is faster, sometimes a lot faster, on hardware without SHA instructions. On hardware with SHA instructions SHA may be faster. Same as the AES story where AES is faster than ChaCha on CPUs with dedicated hardware but slower otherwise.

According to zooko, one of the authors, in new-ish cpus blake3 beats sha256 even with hardware acceleration: https://twitter.com/zooko/status/1419403567320821760
I'd like to see the benchmarks, including power draw. I suspect it is similar to soft ChaCha vs hard AES. A ChaCha software implementation can achieve similar speed as the AES hardware at the cost of significantly higher power draw due to pushing the AVX units at near maximum utilization.
I benchmarked hardware AES vs software ChaCha20, and the former showed an overall performance improvement of an end to end QUIC software stack of more than 50%. The pure crypto difference is probably even higher. That's a huge gap - even thought it might totally be possible that the ChaCha20 implementation of Ring is still improvable.

As a result of that, I asked for rustls to default to AES instead of the previous ChaCha20 default [1]

[1] https://github.com/ctz/rustls/issues/509

Some implementations (IIRC Firefox is one of them) choose it dynamically: if hardware AES is available, AES is ordered before ChaCha20, otherwise ChaCha20 is ordered first. Last time I looked, at least for Intel the i3 didn't have hardware AES (but the i5 and i7 have it), so it was not uncommon for lower-end hardware (which needs speed the most) to have AES slower than ChaCha20.
It should be noted that chacha20 has insanely comfortable security margin. With a more accurate estimate chacha8 has a security margin similar to that of AES, and chacha20 has 2.5 times more rounds, see if it's worth the cost.
Unfortunately almost no one can use ChaCha8 since the standards all call for ChaCha20.
And if you're designing hardware (or on an FPGA) ChaCha (and BLAKE2/3) are actually really freakin' fast for the die area they take. IIRC ChaCha20 beats AES in hardware. It's just extremely rare (FPGA only) to find it in hardware, since there's not enough demand. I expect that to eventually change.
Given that all these are ARX cores, I wonder if a fused ARX instruction could cover a wide range of them?
ARMv8.2 has rotate-and-xor and xor-and-rotate so that the (extremely cheap) xor can be saved.
Probably. With a barrel shifter easily. Barrel shifters are a bit slower than wired shifts though, so for ultimate speed you'd end up with hardware shift amounts.
The advantage would be a single instruction that implemented the core of a wide range of things including BLAKE2, BLAKE3, Salsa, ChaCha, Speck, and more.
One aspect of security is also the misuse resistance. You can of course create a secure MAC with SHA256 in the HMAC configuration, but it usually takes a masters level course on cryptography to know what is Merkle-Damgård construction, and why it's design is imperfect:

You can't just do SHA256(key + message) to generate a safe MAC. With BLAKE (and all SHA3 finalists) you can do that safely.

It's true every time you make the algorithm more misuse resistant, the universe will come up with a more dunning Kruger, but despite that, it's something that can actually improve, the security is already more than adequate: Like Schneier so eloquently put it, "we're building a fence for sheep, it doesn't matter if the fence pole is a mile or two miles high".

Good point. Misuse resistance is also why I am a fan of SIV constructions for stream ciphers, since "repeat a nonce = instant death" is a footgun.

Repeating a nonce is easier than you might think if you are using threads and accessing a nonce counter non-atomically, have a bad RNG, are on an embedded platform with bad RNG seeding, have a bug that overwrites some memory used to generate nonces, or just transfer a ton of data with the same key (birthday attack). SIV makes nonce reuse fairly benign. The only consequence is that if you happen to reuse a nonce with two identical messages, an attacker could tell that you sent the same message twice. That's generally not catastrophic and statistically is far less likely than repeating a nonce with different messages. Repeating a nonce with different messages generally does nothing in SIV.

You could theoretically use SIV with no nonce, with the only consequence being that an attacker could always tell if you sent duplicate messages. Not sure why you'd do that though.

IMHO since we now have ciphers that are probably "unbreakable for the foreseeable future" (e.g. AES and ChaCha) we should probably concentrate on creating and popularizing misuse-resistant constructions as much as possible. It's good to remove footguns.

Hadn't really looked into SIV as I've only written stuff that always generates XChaCha nonces with getrandom but yeah I can totally see why the platform etc. could cause issues that lead to nonce-reuse. This was most informative post, thank you so much!
SIV is usually done with AES/GMAC constructions but you could do it with ChaChaPoly just fine.

The big downside is that it requires two passes on encrypt: one to create the MAC and derive the IV and another to encrypt. The overhead for this is small for message/packet based systems though since after pass one the data will be sitting hot in the processor's L0 cache. Decryption can be done in one pass.

Aren't you supposed to Mac the encrypted data?
> You can't just do SHA256(key + message) to generate a safe MAC.

Can you explain this?

A Sha256 hash is just a dump of the internal state of the function. If you know the hash, you can keep running the hash function for more data and calculate a new hash for the original data with new data appended.
What @dagenix said. See e.g. Thomas Pornin's answer here https://crypto.stackexchange.com/a/3979 for more details
If you have the output

    h = SHA-256(k || m1)
you can easily compute a function `F(h, m2)` such that

    SHA-256(k || m1 || m2) = F(h, m2)
allowing you to forge a verifier for `m1 || m2` under `k` for any `m2` you wish without actually knowing `k`.
You can with the truncated versions, though.
single-threaded blake3 is about 1.9x faster than (hardware) sha256 on my Zen 2 CPU.
Hardware implementation.