Hacker News new | ask | show | jobs
by stryku2393 2306 days ago
Thanks for pointing this out.

You're right. As I said in some comment up there, the benchmarks are bad. I benchmarked the best and the worst case in scope of the original file, so I look up for the first and the last hash.

I totally missed that if I look for the same hash over and over again, I'd end up reading the same B-tree files parts, so they can be easily cached. That's probably the reason it seemed so fast.

I'll rewrite benchmarks and update the results.

About the Bloom filter, I thought about that but I wanted a solution that is 100% correct. The filter doesn't guarantee that.

About the sorting. I wanted to implement a stand-alone library that does everything for you and that is cross-platform. That's why everything is implemented, without dependencies/usage of other tools.