|
|
|
|
|
by lvh
2307 days ago
|
|
You can do better than binary search here with interpolation search since the data is equidistributed. (You can do it for any data with a known distribution, but it's unlikely to be worth it for any data set that isn't linearly distributed.) Roughly, instead of picking at 1/2 every time, you pick b/256 in, where b is the current byte value. That gets you log log n search. (You can of course pick 2 bytes, or 3, or...) |
|
The fastest method would be to treating the 20-byte hash as a floating point number in the range 0.0 and 1.0. This would let you use Newton's iteration method to very accurately guess the approximate location in the file and then narrow it down from there. By reading small blocks of 0.5-4KB you could probably get to the required hash in just 1-2 reads, which is either optimal or very close to it.
Do clever things to simple data!