Hacker News new | ask | show | jobs
by more_original 3397 days ago
The PDFs have the same size, but they do not have a header in the file that states their overall size. If PDF had a header at the beginning of the file that states the file size, then it could be harder to find a collision. From what I understand, the attack works by inserting garbage data after a fixed file prefix and before a fixed file suffix (anyone please correct me if I'm wrong).
2 comments

> If PDF had a header at the beginning of the file that states the file size, then it could be harder to find a collision.

No. It doesn't change anything if the size is in the PDF header. The size of both PDFs are the same, the header of both PDF files is the same on the both "shattered" files now.

What Linus says is that if you tried to put these two PDF files in git, it would not see them as the same, as git calculates the sha1 differently. But Google would be able to produce two PDF files that would, as git sees them, appear to be same just as easy as these that were produced.

P.S. (answer to your answer to this message) Note, You wrote one level above

> If PDF had a header at the beginning of the file that states the file size, then it could be harder to find a collision.

And I argued that it isn't harder, but irrelevant.

From your answer:

> But to generate a collision with a different prefix q one would have to do the expensive computation all over again

Yes. Now read what your claim was again. It's not harder. Exactly as easy as the first time.

> But Google would be able to produce two PDF files that would, as git sees them, appear to be same just as easy as these that were produced.

Right, but they would have to re-do their enormous calculation. ("This attack required over 9,223,372,036,854,775,808 SHA1 computations.")

Google started with a common prefix p (the PDF header), then computed blocks M11, M12, M21 and M22, such that (p || M11 || M21 || S) and (p || M12 || M22 || S) collide for any suffix S. Given p, M11, M12, M21 and M22, anyone can make colliding PDFs that show different contents quickly. But to generate a collision with a different prefix q, e.g. one including the file size, one would have to do the expensive computation all over again, I think.

Note: I'm not trying to argue that SHA-1 can be made secure with padding. I was just trying to say that the statement "The PDFs have the same size" misses the point.

Why would that make the attack harder? Both PDFs are the same length.