| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by api 1787 days ago
	Security-wise they are roughly equivalent. While SHA has had more eyes on it I doubt either construction will ever be practically broken at hash sizes like 384 or 512 bits. Someone may find an "academic break" at some point. BLAKE3 is faster, sometimes a lot faster, on hardware without SHA instructions. On hardware with SHA instructions SHA may be faster. Same as the AES story where AES is faster than ChaCha on CPUs with dedicated hardware but slower otherwise.

4 comments

nonagono 1787 days ago

According to zooko, one of the authors, in new-ish cpus blake3 beats sha256 even with hardware acceleration: https://twitter.com/zooko/status/1419403567320821760

link

magila 1787 days ago

I'd like to see the benchmarks, including power draw. I suspect it is similar to soft ChaCha vs hard AES. A ChaCha software implementation can achieve similar speed as the AES hardware at the cost of significantly higher power draw due to pushing the AVX units at near maximum utilization.

link

Matthias247 1787 days ago

I benchmarked hardware AES vs software ChaCha20, and the former showed an overall performance improvement of an end to end QUIC software stack of more than 50%. The pure crypto difference is probably even higher. That's a huge gap - even thought it might totally be possible that the ChaCha20 implementation of Ring is still improvable.

As a result of that, I asked for rustls to default to AES instead of the previous ChaCha20 default [1]

[1] https://github.com/ctz/rustls/issues/509

link

cesarb 1787 days ago

Some implementations (IIRC Firefox is one of them) choose it dynamically: if hardware AES is available, AES is ordered before ChaCha20, otherwise ChaCha20 is ordered first. Last time I looked, at least for Intel the i3 didn't have hardware AES (but the i5 and i7 have it), so it was not uncommon for lower-end hardware (which needs speed the most) to have AES slower than ChaCha20.

link

GoblinSlayer 1787 days ago

It should be noted that chacha20 has insanely comfortable security margin. With a more accurate estimate chacha8 has a security margin similar to that of AES, and chacha20 has 2.5 times more rounds, see if it's worth the cost.

link

wmf 1787 days ago

Unfortunately almost no one can use ChaCha8 since the standards all call for ChaCha20.

link

bobbyskelton41 1787 days ago

That sort of depends: https://github.com/rui314/mold/issues/92#issuecomment-879709...

link

SAI_Peregrinus 1787 days ago

And if you're designing hardware (or on an FPGA) ChaCha (and BLAKE2/3) are actually really freakin' fast for the die area they take. IIRC ChaCha20 beats AES in hardware. It's just extremely rare (FPGA only) to find it in hardware, since there's not enough demand. I expect that to eventually change.

link

api 1787 days ago

Given that all these are ARX cores, I wonder if a fused ARX instruction could cover a wide range of them?

link

brandmeyer 1787 days ago

ARMv8.2 has rotate-and-xor and xor-and-rotate so that the (extremely cheap) xor can be saved.

link

SAI_Peregrinus 1787 days ago

Probably. With a barrel shifter easily. Barrel shifters are a bit slower than wired shifts though, so for ultimate speed you'd end up with hardware shift amounts.

link

api 1787 days ago

The advantage would be a single instruction that implemented the core of a wide range of things including BLAKE2, BLAKE3, Salsa, ChaCha, Speck, and more.

link

maqp 1787 days ago

One aspect of security is also the misuse resistance. You can of course create a secure MAC with SHA256 in the HMAC configuration, but it usually takes a masters level course on cryptography to know what is Merkle-Damgård construction, and why it's design is imperfect:

You can't just do SHA256(key + message) to generate a safe MAC. With BLAKE (and all SHA3 finalists) you can do that safely.

It's true every time you make the algorithm more misuse resistant, the universe will come up with a more dunning Kruger, but despite that, it's something that can actually improve, the security is already more than adequate: Like Schneier so eloquently put it, "we're building a fence for sheep, it doesn't matter if the fence pole is a mile or two miles high".

link

api 1787 days ago

Good point. Misuse resistance is also why I am a fan of SIV constructions for stream ciphers, since "repeat a nonce = instant death" is a footgun.

Repeating a nonce is easier than you might think if you are using threads and accessing a nonce counter non-atomically, have a bad RNG, are on an embedded platform with bad RNG seeding, have a bug that overwrites some memory used to generate nonces, or just transfer a ton of data with the same key (birthday attack). SIV makes nonce reuse fairly benign. The only consequence is that if you happen to reuse a nonce with two identical messages, an attacker could tell that you sent the same message twice. That's generally not catastrophic and statistically is far less likely than repeating a nonce with different messages. Repeating a nonce with different messages generally does nothing in SIV.

You could theoretically use SIV with no nonce, with the only consequence being that an attacker could always tell if you sent duplicate messages. Not sure why you'd do that though.

IMHO since we now have ciphers that are probably "unbreakable for the foreseeable future" (e.g. AES and ChaCha) we should probably concentrate on creating and popularizing misuse-resistant constructions as much as possible. It's good to remove footguns.

link

maqp 1787 days ago

Hadn't really looked into SIV as I've only written stuff that always generates XChaCha nonces with getrandom but yeah I can totally see why the platform etc. could cause issues that lead to nonce-reuse. This was most informative post, thank you so much!

link

api 1787 days ago

SIV is usually done with AES/GMAC constructions but you could do it with ChaChaPoly just fine.

The big downside is that it requires two passes on encrypt: one to create the MAC and derive the IV and another to encrypt. The overhead for this is small for message/packet based systems though since after pass one the data will be sitting hot in the processor's L0 cache. Decryption can be done in one pass.

link

forty 1785 days ago

Aren't you supposed to Mac the encrypted data?

link

staticassertion 1787 days ago

> You can't just do SHA256(key + message) to generate a safe MAC.

Can you explain this?

link

dagenix 1787 days ago

A Sha256 hash is just a dump of the internal state of the function. If you know the hash, you can keep running the hash function for more data and calculate a new hash for the original data with new data appended.

link

maqp 1787 days ago

What @dagenix said. See e.g. Thomas Pornin's answer here https://crypto.stackexchange.com/a/3979 for more details

link

stouset 1787 days ago

If you have the output

    h = SHA-256(k || m1)

you can easily compute a function `F(h, m2)` such that

    SHA-256(k || m1 || m2) = F(h, m2)

allowing you to forge a verifier for `m1 || m2` under `k` for any `m2` you wish without actually knowing `k`.

link

wvh 1787 days ago

You can with the truncated versions, though.

link

ptomato 1787 days ago

single-threaded blake3 is about 1.9x faster than (hardware) sha256 on my Zen 2 CPU.

link