Hacker News new | ask | show | jobs
by Jerrrry 872 days ago
>If I'm following my intuitions about the math in the right direction

you are, except our theoretical familiarity with math and the antecedent nature of life can easily lead us to intuitions that mature to fallacies quickly.

https://en.wikipedia.org/wiki/Birthday_problem

>e.g. by matching at the very beginning and very end?)

Thankfully those smarter than us have solved this problem too - the "hashing" algorithm is so fundamentally lossy (but not too lossy to fall into the pidgen-hole paradox) 1-way, that it is mathematically impossible to have any knowledge of the end of the hash before you get it.

You can "brute-force" it backwards, sure (for some old hashes obviously) - give me a string that's MD5 starts with "Jerry" and ends with "loves math", and I will congratulate you on your waste of computational resources.

1 comments

Sure, but targeting similarity with a previously-chosen hash is a scenario where the birthday paradox doesn't come in. The case where it does would be "can we produce two arbitrary new hashes that are similar in this way?", in which case the amount of work required might be about the square root of what our intuition might suggest.

(although I think there's an explosion in the required space in that case because you need to store information about all of the values that you've already been able to produce, in order to learn whether new values collide with them!)

>"can we produce two arbitrary new hashes that are similar in this way?"

arbitrary may be the heavy lifter here, we can certainly birthday-paradox two address that look similar (square root, yes)

>(although I think there's an explosion in the required space in that case because you need to store information about all of the values that you've already been able to produce, in order to learn whether new values collide with them!)

bloom hash table a bloom hash table with some nerdy optimizations for backtracking, depending on whether your IO/CPU/GPU or network were the bottleneck. If you got a double-positive, skip the integer/nonce/etc.

Although, realistically, I'd be very surprised if in a quintilion PETAFLOPS you found a single 128bit number that, after being hashed twice, starts with "face" and ends with "book"

Arbitrary means: It's "easy" (square root) to find two numbers that resemble each other in a sufficiently large set, but neither of them will resemble anything meaningful. It's still "hard" to find a number that resembles a previously given different number, such as the bbcnews hash above. (The chance that any two kids in a room share a birthday is fairly high; the chance that a kid has their birthday on January 1st is much lower.)

> Although, realistically, I'd be very surprised if in a quintilion PETAFLOPS you found a single 128bit number that, after being hashed twice, starts with "face" and ends with "book"

We can just calculate it. "face" + "book" is 8 characters in base 64, for a total of 8*6=48 bits that need to be set a certain way. 2^48 is roughly 10^15. Hashing once or twice barely matters at this point (2*10^15 ~=~ 10^15). A quintillion petaflops is 10^33 flops, so unless your hashing algorithm takes 10^18 floating point operations, you have an incredibly high probability of finding such a number within a second.

nerd.

Just kidding - this was a much more clearer response with some actual math behind it, thank you.

This (may) explain a facet of .onion phishing attempts at a certain point in time.