|
|
|
|
|
by jiggawatts
2305 days ago
|
|
The HIBP database is updated as infrequently as once a year. Sorting a handful of gigabytes was a solved problem over 20 years ago. Sorting a file using an "online" operation like row-by-row insertion in a B-Tree is necessarily slower than any offline sort. Merge sort is particularly efficient with data that won't fit on disk. If you do need "online" operation, a real database engine provides this (and more) for practically zero effort. It's almost as if... they were designed for this purpose! |
|
Yes, you're right that sorting data using B-tree is slower than other methods. But I don't use it only for sorting. It's not like 'create B-tree and forget about it'. You create the tree from initial data, and if you need to insert any more hashes there, you can do it really quickly, without a need to sort everything again.
I agree again. If I'd create okon only for my home usage, I'd probably use a real database. But I didn't want to depend on anything. I wanted to create a library that can be used by any program and just works, without forcing user to install anything else. That's why I sort the data on my own, and that's why I implemented B-tree on my own. Everything is in okon's codebase.