|
|
|
|
|
by stryku2393
2306 days ago
|
|
Thanks for pointing this out. You're right. As I said in some comment up there, the benchmarks are bad.
I benchmarked the best and the worst case in scope of the original file, so I look up for the first and the last hash. I totally missed that if I look for the same hash over and over again, I'd end up reading the same B-tree files parts, so they can be easily cached. That's probably the reason it seemed so fast. I'll rewrite benchmarks and update the results. About the Bloom filter, I thought about that but I wanted a solution that is 100% correct. The filter doesn't guarantee that. About the sorting. I wanted to implement a stand-alone library that does everything for you and that is cross-platform. That's why everything is implemented, without dependencies/usage of other tools. |
|