Hacker News new | ask | show | jobs
by stryku2393 2305 days ago
As I said in the article, I rejected binary search and similar approaches on purpose.

In general, I rejected all approaches that operates on a sorted file. That's simply because keeping the file sorted is slow, when you need to insert more data in the future. With B-tree approach, you have fast searching, as well as, inserting. That's the reason.

1 comments

The HIBP database is updated as infrequently as once a year.

Sorting a handful of gigabytes was a solved problem over 20 years ago. Sorting a file using an "online" operation like row-by-row insertion in a B-Tree is necessarily slower than any offline sort. Merge sort is particularly efficient with data that won't fit on disk.

If you do need "online" operation, a real database engine provides this (and more) for practically zero effort. It's almost as if... they were designed for this purpose!

There is no restriction on using okon only for HIBP passwords. Actually, the goal is to handle hashes, no matter where they come from. User may have reasons to update db frequently.

Yes, you're right that sorting data using B-tree is slower than other methods. But I don't use it only for sorting. It's not like 'create B-tree and forget about it'. You create the tree from initial data, and if you need to insert any more hashes there, you can do it really quickly, without a need to sort everything again.

I agree again. If I'd create okon only for my home usage, I'd probably use a real database. But I didn't want to depend on anything. I wanted to create a library that can be used by any program and just works, without forcing user to install anything else. That's why I sort the data on my own, and that's why I implemented B-tree on my own. Everything is in okon's codebase.