|
|
|
|
|
by marginalia_nu
1663 days ago
|
|
The main reason I'm doing it this way is because I'm saving this stuff to a mechanical drive, and I want consistent write performance and low memory overhead. Since it's essentially just an archive copy, I don't mind if it takes half an hour to chew through looking for some particular set of files. Since this is a format deigned for tape drives, it causes very little random access. It's important that it's relatively consistent to write since my crawler does while it's crawling, and it can reach speeds of 50-100 documents per second, which would be extremely rough on any sort of database based on a single mechanical hard drive. These archives are just an intermediate stage that's used if I need to reconstruct the index to tweak say keyword extraction or something, so random access performance isn't something that is particularly useful. |
|