Hacker News new | ask | show | jobs
by muxator 1663 days ago
First of all, thanks for marginalia.nu.

Have you considered stored compressed blobs in a sqlite file? Works fine for me, you can do indexed searches on your "stored" data, and can extract single pages if you want.

1 comments

The main reason I'm doing it this way is because I'm saving this stuff to a mechanical drive, and I want consistent write performance and low memory overhead. Since it's essentially just an archive copy, I don't mind if it takes half an hour to chew through looking for some particular set of files. Since this is a format deigned for tape drives, it causes very little random access. It's important that it's relatively consistent to write since my crawler does while it's crawling, and it can reach speeds of 50-100 documents per second, which would be extremely rough on any sort of database based on a single mechanical hard drive.

These archives are just an intermediate stage that's used if I need to reconstruct the index to tweak say keyword extraction or something, so random access performance isn't something that is particularly useful.