|
|
|
|
|
by prettyrandom100
4121 days ago
|
|
Do you mean that there could be two 3K files with the same sha256 hash and the probability of hitting the collision is greater than the probability of finding the correct hash for the file ?
Let's divide the 3 mb file into 1000 parts, each having a ~3k size. Take sha256 hash of all 1000 parts and sha256 hash of the actual file. The size of these hashes are less than the actual file + leaves ample room for a decompressor. Now start brute-forcing and assume we have all the run-time power and time. Would the probability of collision happening in all 1000 parts be very low then ? Given a good hashing function could it be so low that we can disregard it ? If 1000 is not enough, can we increase the splits to 2000 or 3000 ? |
|
By the pigeonhole principle, we can't fit 2^25165824 objects into 2^256256 holes; indeed each file will have on average 2^24909568 other files that share the same set of 1001 sha256 hashes.
The reason that we are able to work with hashes, and that compression works in practice, is that we don't have to deal with every 3M file. Most of these files are gibberish and will never be seen, and finding two files that match even one hash is incredibly difficult. But once we start talking about brute-forcing, we start encountering problems -- and having to dedicate an awful lot of processing power to the problem isn't the biggest one...