| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gruez 902 days ago
	Generating 2^64 hashes isn't guaranteed to produce a collision, and even if a collision did exist in that set, you're not going to find it by getting a bunch of GPUs to compute 2^64 hashes. There's a huge difference between a haystack that maybe contains a needle, and a needle that's been pulled from the haystack and presented to you. To actually find and identify the collisions you'll need to hook those GPUs up to some sort of storage/retrieval system. Just to store 2^64 128-bit hashes would take 295.1 exabytes. That's an order of magnitude more storage than NSA's utah datacenter[1]. [1] https://en.wikipedia.org/wiki/Utah_Data_Center

1 comments

moyix 902 days ago

You can generate pairs of hashes for random inputs and check for collision without storing all of the outputs, no?

link

adrian_b 902 days ago

True, which is why 256-bit or bigger hashes are recommended to make sure that the time for finding a collision is alone great enough to make this impossible.

Finding a collision for an 160-bit hash by brute force, on a big cloud or supercomputer, is about the maximum that someone with unlimited financial resources could do.

link

gruez 902 days ago

That requires way more hashes to be computed though.

link

moyix 902 days ago

I don't think that's correct? It's probabilistic, yes, but in expectation you would still need ~2^64 hashes to find a collision for a 128-bit hash (birthday paradox).

link

gruez 902 days ago

See my above comment. There's a huge difference between having a collision somewhere in a pile of 2^64 hashes, and actually having the colliding hashes ready to present.

link

moyix 902 days ago

Are you trying to get at the distinction between second preimage and collision resistance? If all you need is a collision, "try random pairs until you find a collision" method works fine and is arbitrarily parallelizable with no storage.

link

gruez 902 days ago

>"try random pairs until you find a collision" method works fine and is arbitrarily parallelizable with no storage.

That requires you to do 2^128 checks on average, not 2^64. Again, the problem is that the birthday problem only exists if you have all the hashes available to compare. If you're just doing two hashes at a time and comparing the two, the chances of you getting a match for each try is 1 in 2^128, and this is independent for each attempt. To get a 50% chance you need 2^128 attempts.

If you can't see why that's the case, you can empirically test this, by using a 16-bit "hash" rather than a 128-bit hash: https://pastebin.com/7xW04Jgg

The average I got during one invocation was 58161, which is 2^15.8, not 2^8 as you'd expect. However, if you modify the code to include a storage/retrieval system: https://pastebin.com/AxF9S0q5, the average drops to near 2^8. For my last invocation, I got 309.7 which is 2^8.2

link