Hacker News new | ask | show | jobs
by wahern 1633 days ago
AFAICT, BLAKE2s (previously SHA-1) is only being used for the forward secrecy element, in this case mixing a hash of the pool back into the core state, which is actually still using ChaCha20 for expansion. From quick inspection (never read this code before) I think the number of bytes in the pool is 416 (104 * 4). (See poolwords defined at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...) For such relatively small messages and based on the cycles-per-byte performance numbers from https://w3.lasca.ic.unicamp.br/media/publications/p9-faz-her... (SHA-NI benchmarks), https://www.blake2.net/blake2_20130129.pdf (BLAKE2 paper), and https://bench.cr.yp.to/results-hash.html (comprehensive table of measurements), I don't see any performance reasons for choosing BLAKE2s over SHA-256. Rather, software SHA-256 and BLAKE2s seem comparable (and that's being charitable to BLAKE2s), and SHA-NI is definitely faster.

Perhaps there were other considerations at play. Maybe something as simple as the author's preference. One thing that probably wasn't a consideration is FIPS compliance--the core function is ChaCha20 so FIPS kernels require a completely different CSPRNG, anyhow.

1 comments

https://github.com/BLAKE3-team/BLAKE3/blob/master/media/spee... argues BLAKE2s is twice as fast compared to SHA256.

One aspect switching from SHA1 to BLAKE2s does is it increases the total entropy a single compression operation adds to ChaCha20. This increases speed when folded BLAKE2s adds 128 bits per operation instead of folded SHA-1 that adds 80 bits. So that's two calls instead of four (I'm assuming they kept the folding). Another speedup comes from the fact the hash function constants aren't being filled with RDRAND inputs for every call.

Finally, I'm not completely sure if increasing the hash size itself adds computational security against an attack where the internal state is compromised once, and the attacker tries to brute force the new state based on new output; My conjecture is the reseeding operation is atomic, i.e. that ChaCha20 won't yield anything until the reseed is complete. There shouldn't thus be any difference in this regard. I'd appreciate clarification wrt this.

> argues BLAKE2s is twice as fast compared to SHA256.

That's for 16KiB inputs.

> One aspect switching from SHA1 to BLAKE2s does is it increases the total entropy a single compression operation adds to ChaCha20. This increases speed when folded BLAKE2s adds 128 bits per operation instead of folded SHA-1 that adds 80 bits.

But the question was why BLAKE2s instead of SHA-256, not SHA-1. SHA-256 has the same digest length as BLAKE2s.

> That's for 16KiB inputs.

BLAKE3 needs 16 KiB of input to hit the numbers in that bar chart, but BLAKE2s doesn't. It'll maintain its advantage over SHA-256 all the way down to the empty string. You can see this in Figure 3 of https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blak.... (BLAKE3 is also faster than SHA-256 all the way down to the empty string, but not by as large a margin as the 16 KiB measurements suggest.)

On the other hand, these measurements were done on machines without SHA-256 hardware acceleration. If you have that (and Intel chips from the past year do), then SHA-256 does a lot better of course.

Ah, the low eBASH numbers must be for SHA-NI. Looking at older CPUs SHA-256 ends up being about 2x slower than BLAKE2s at 576-byte message sizes.
>That's for 16KiB inputs.

Oh that's a good point.

>But the question was why BLAKE2s instead of SHA-256, not SHA-1. SHA-256 has the same digest length as BLAKE2s.

Two things come to mind. Firstly, does it really matter to speed? The reseeding interval of ChaCha20 DRNG (i.e. BLAKE2 call frequency) is 300 seconds and it runs in the order of milliseconds. Best bang for buck at this point would result from ChaCha-NI.

Secondly, there's the aspect of reducing reliance on an algorithm that suffers from length extension attacks. While LRNG itself doesn't directly benefit from BLAKE2s's indifferentiability, it helps in phasing out SHA-2 which is less misuse-resistant, and that might be misused elsewhere.

(Finally, no more pointless flame wars about "An algorithm created by the NSA is being used in the LRNG!!")

> The reseeding interval of ChaCha20 DRNG (i.e. BLAKE2 call frequency) is 300 seconds and it runs in the order of milliseconds.

I would guess microseconds.