Hacker News new | ask | show | jobs
by jiggawatts 2305 days ago
The HIBP database is updated as infrequently as once a year.

Sorting a handful of gigabytes was a solved problem over 20 years ago. Sorting a file using an "online" operation like row-by-row insertion in a B-Tree is necessarily slower than any offline sort. Merge sort is particularly efficient with data that won't fit on disk.

If you do need "online" operation, a real database engine provides this (and more) for practically zero effort. It's almost as if... they were designed for this purpose!

1 comments

There is no restriction on using okon only for HIBP passwords. Actually, the goal is to handle hashes, no matter where they come from. User may have reasons to update db frequently.

Yes, you're right that sorting data using B-tree is slower than other methods. But I don't use it only for sorting. It's not like 'create B-tree and forget about it'. You create the tree from initial data, and if you need to insert any more hashes there, you can do it really quickly, without a need to sort everything again.

I agree again. If I'd create okon only for my home usage, I'd probably use a real database. But I didn't want to depend on anything. I wanted to create a library that can be used by any program and just works, without forcing user to install anything else. That's why I sort the data on my own, and that's why I implemented B-tree on my own. Everything is in okon's codebase.