Hacker News new | ask | show | jobs
by colmmacc 902 days ago
You say to use SHA2, but TFA says to use SHA3 or Blake. I think your recommendation is the better one, but I feel like teasing out why because it's interesting.

Firstly ... the NIST recommendation TFA links doesn't just recommend SHA-3, it actually says "Federal agencies should use SHA-2 or SHA-3 as an alternative to SHA-1." SHA-2 and SHA-3 are both valid and recommended hash functions by NIST. And while 3 is higher than 2, and SHA-3 is newer, in this case it doesn't mean "better". Being based on Keccak and a sponge construction, SHA-3 provides "diversity" more than "improvement".

Secondly ... SHA2 is widely implemented in existing hardware, and it's just currently more efficient (and likely to remain so). So why waste power, especially on something you'll be doing in bulk.

O.k., so that's why SHA-2 and not SHA-3. But Blake is worth avoiding IMO ... because the FISMA says that for Federal work, you have to use one of what NIST recommends in FIPS. Obviously the legal profession needs to be able to practice in Federal courts (Article III and administrative) ... so if you're going to pick a new standard, pick one of those (but not SHA-3!).

Lastly, and this is really an aside ... it's not uncommon for some folks to think SHA3 and SHA384 are the same thing, but they are not. SHA384 is just a variant of SHA-2 with a 384-bit digest length and correspondingly improved security margin. Other Federal standards, like CNSA, separately recommend SHA384 as a good minimum ... so it can be confusing and I think it's understandable why some people think this is what SHA3 is just short for.

3 comments

BLAKE3 is faster than hardware accelerated SHA-2 because the tree mode used in BLAKE3 allows hashing parts of a single message in parallel (with SHA-2, parts of a single message have to be hashed one after another, and parallelism is only used in workloads where you process multiple messages at the same time).

https://github.com/minio/sha256-simd

https://github.com/BLAKE3-team/BLAKE3

SHA-3 (also BLAKE2 or BLAKE3) is definitely more secure for very large documents than SHA-2.

The security (i.e. the difficulty in finding collisions) decreases with the length of the document for SHA-2 and it stays constant for SHA-3.

Nevertheless, it is unlikely that typical legal documents are big enough for this to matter, except when the hashes would be e.g. for entire seized HDDs or SSDs, so SHA-2 is an acceptable choice for replacing MD5.

The only reason why SHA-3 has not become widespread is that it, like also AES, requires hardware support for good performance, but for some reason Intel has not added an SHA-3 instruction to the x86 ISA. Arrow Lake S, expected to be launched at the end of this year, will add support for SHA-512 and for the Chinese standard hashes, but there is still no intention to add SHA-3 (like Arm already did).

Both AES and SHA-3 are not recommendable on CPUs without dedicated instructions, due to low performance, but with hardware support they become faster than any alternatives. The difference between them is that now only the cheaper microcontrollers lack AES support, while SHA-3 support is still seldom encountered.

> The security (i.e. the difficulty in finding collisions) decreases with the length of the document for SHA-2

Could you spell this part out for me?

Twenty years ago this paper was considered surprising and, together with a handful of other attacks and with the concrete attacks against MD5 and SHA-1 succeeded by some Chinese researchers prompted the organization of the SHA-3 competition.

John Kelsey and Bruce Schneier: "Second Preimages on n-bit Hash Functions for Much Less than 2^n Work"

https://eprint.iacr.org/2004/304

The abstract at this link provides the essential results.

"We provide a second preimage attack on all n-bit iterated hash functions with Damgaard-Merkle strengthening and n-bit intermediate states, allowing a second preimage to be found for a 2^k-message-block message with about k * 2^(n/2+1) + 2^(n-k+1) work. Using SHA-1 as an example, our attack can find a second preimage for a 2^60 byte message in 2^106 work, rather than the previously expected 2^160 work."

Besides this result, there is also the previous result obtained by Joux for multi-collisions, which also become easier for longer input data (Antoine Joux: "Multicollisions in Iterated Hash Functions. Application to Cascaded Constructions").

Thanks! I've added a note about this here: https://github.com/oconnor663/bao/issues/41#issuecomment-119.... Does that sound like an accurate summary to you?
It is OK, but the problems demonstrated by these attacks and others similar to them are specific to the linear chaining of a great number of blocks, where the same hashing function is applied to all blocks.

Incorporating somehow a block counter in the block hashing function or using a sponge structure is equivalent to using a different hashing function for each block, which stops the attacks.

Tree hashing where the final branches i.e. the chunks are short, so there are only a small number of blocks hashed in cascade with the same function, behaves differently from linear block chaining, because the hashing function must include additional inputs anyway, so the blocks in a chunk are hashed differently from the blocks that merge tree branches or the root block.

Whether any of the attacks that exist for simple Merkle-Damgaard cascading like SHA-2 are applicable to a tree hash depends on how the blocks are encoded and on the structure of the tree. In any case, the probabilities of success will be different in the case of a tree hash.

I have never analyzed whether the BLAKE2 or BLAKE3 tree hashes could be secure enough without block counters. In any case, the block counters, which are absolutely necessary for linear block chaining, also increase the strength of tree hashing against any attacks and their added overhead is minimal, so there would be no reason to remove them.

A hash computed with a sponge structure, like SHA-3 or KangarooTwelve, is equivalent with a CBC-MAC where the key is updated at each block, instead of being constant. Because each block is hashed with a different key, there is no need for an additional counter input to differentiate the hashing functions. A sponge structure is much simpler than the structure used by BLAKE, but it needs a wider invertible mixing transformation, e.g. for a similar strength BLAKE2b uses an 1024-bit mixing transformation, while SHA-3 uses an 1600-bit mixing transformation. (BLAKE3 trades off strength for speed, so it is not comparable with SHA-3, but only with KangarooTwelve.)

It feels like you're just hunting for complexity a bit here. Any of SHA2, SHA3, or Blake are viable options.

I'd only let NIST and FIPS drive my crypto choices if I was being actively forced to do so by a government use case, and even so I'd be praying for the day that FIPS joins us in the modern era and stops tethering us to acronyms they could enumerate a decade ago.

He's hunting for complexity because it's interesting to talk about!