Hacker News new | ask | show | jobs
by raziel2p 3400 days ago
A bit hard for me to tell what happened here, maybe because I don't know anything about SVN. The two PDFs with equal SHA1 hashes were git commited to the repository, but converting that to an SVN commit failed because... SVN can't handle two separate files with the same SHA1 hash?
3 comments

This might be at fault:

> Subversion 1.8 avoids downloading pristine content that is already present in the cache, based on the content's SHA1 or MD5 checksum.

https://subversion.apache.org/docs/release-notes/1.8.html#pr...

It's likely some part of the svn implementation that assumes that the SHA1 signatures guarantee uniqueness within a repo. And they might use that hash as an identifier.

I'm guessing shattered-1.pdf and shattered-2.pdf have identical hashes but distinct contents. It's not clear for me to know why this results in a "checksum mismatch."

    Checksum mismatch: LayoutTests/http/tests/cache/disk-cache/resources/shattered-2.pdf
    expected: 5bd9d8cabc46041579a311230539b8d1
        got: ee4aa52b139d925f8d8884402b0a750c
EDIT: see https://news.ycombinator.com/item?id=13725312 for the answer
Heh, because those are the md5 checksums which don't match.

  $ sha1sum shattered*
  38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-1.pdf
  38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-2.pdf

  $ md5sum shattered*
  ee4aa52b139d925f8d8884402b0a750c  shattered-1.pdf
  5bd9d8cabc46041579a311230539b8d1  shattered-2.pdf
As you can see.
Wouldn't using both sha1 and md5 solves our problem or the fact both have collision in some cases dooms that combination ?
I asked this question yesterday. Apparently using both is not much more of a barrier than the stronger of the two by itself.

[1] https://news.ycombinator.com/item?id=13715146

[2] https://www.iacr.org/archive/crypto2004/31520306/multicollis...

A 256-bit hash, even a bad one, is usually more secure than a 128-bit hash, even a good one. But a 256-bit hash designed to be used as a 256-bit hash is probably going to be better than trying to come up with your own one ad-hoc by combining two 128-bit hashes. E.g. many hash functions have parts in common, so you might not get 256 independent bits that way.
In this scenario, the reasoning usually is: since both sha1 and md5 are vulnerable, it is possible to construct a document in which both the sha1 and md5 match. I don't know the feasibility of this, nor do I know how much compute time it would add. But that is a typical argument against those type of "combine two hashes" approach
No. You can't assume that two documents match because their hash values match. That's what caused this problem. You don't solve a problem by doing the same thing that caused the problem in the first place. For any given string s of unbounded length and its hash h, there are an infinite number of strings s', s'', s''', etc that have the same hash value h. Change from 128 to 256 bit hash? Great, there are still an infinite number of collisions. Change to two concatenated hashes? Guess what: infinite number of collisions.

A hash is not a unique identifier. Period. It's only useful as a quick filter before you do a full comparison.

It's amazing how resistant people are to using hashes safely. They willfully ignore the birthday paradox and say LA LA LA 1/(2^128)=0 and because they haven't lost all their data yet they tell themselves that their shoddy practices are OK.

Or just using a single better hash would solve it (some variant of SHA-2). For the foreseeable future at least.

Edit: Mitigate it rather, not solve it.

They were directly committed to the SVN repository, apparently breaking SVN's tooling even after the commit had been deleted. The git-to-SVN mirror script was the first place where a failure was noticed and was initially thought to be the only broken bit.