Hacker News new | ask | show | jobs
by gruez 902 days ago
Generating 2^64 hashes isn't guaranteed to produce a collision, and even if a collision did exist in that set, you're not going to find it by getting a bunch of GPUs to compute 2^64 hashes. There's a huge difference between a haystack that maybe contains a needle, and a needle that's been pulled from the haystack and presented to you. To actually find and identify the collisions you'll need to hook those GPUs up to some sort of storage/retrieval system. Just to store 2^64 128-bit hashes would take 295.1 exabytes. That's an order of magnitude more storage than NSA's utah datacenter[1].

[1] https://en.wikipedia.org/wiki/Utah_Data_Center

1 comments

You can generate pairs of hashes for random inputs and check for collision without storing all of the outputs, no?
True, which is why 256-bit or bigger hashes are recommended to make sure that the time for finding a collision is alone great enough to make this impossible.

Finding a collision for an 160-bit hash by brute force, on a big cloud or supercomputer, is about the maximum that someone with unlimited financial resources could do.

That requires way more hashes to be computed though.
I don't think that's correct? It's probabilistic, yes, but in expectation you would still need ~2^64 hashes to find a collision for a 128-bit hash (birthday paradox).
See my above comment. There's a huge difference between having a collision somewhere in a pile of 2^64 hashes, and actually having the colliding hashes ready to present.
Are you trying to get at the distinction between second preimage and collision resistance? If all you need is a collision, "try random pairs until you find a collision" method works fine and is arbitrarily parallelizable with no storage.
>"try random pairs until you find a collision" method works fine and is arbitrarily parallelizable with no storage.

That requires you to do 2^128 checks on average, not 2^64. Again, the problem is that the birthday problem only exists if you have all the hashes available to compare. If you're just doing two hashes at a time and comparing the two, the chances of you getting a match for each try is 1 in 2^128, and this is independent for each attempt. To get a 50% chance you need 2^128 attempts.

If you can't see why that's the case, you can empirically test this, by using a 16-bit "hash" rather than a 128-bit hash: https://pastebin.com/7xW04Jgg

The average I got during one invocation was 58161, which is 2^15.8, not 2^8 as you'd expect. However, if you modify the code to include a storage/retrieval system: https://pastebin.com/AxF9S0q5, the average drops to near 2^8. For my last invocation, I got 309.7 which is 2^8.2