|
|
|
|
|
by placidpanda
1312 days ago
|
|
When doing this in the past, I settled on an sqlite database with one table that stores the compressed html (gzip or lzma) along with other columns (id/date/url/domain/status/etc.) Also made it easy to alert on when something broke (query the table for count(*) where status=error) and rerun the parser for failures. |
|
Storing pages as files is a no-go because it wastes way too much disk space due to block sizes. While more customized cache tools will never be as flexible or have as much tooling as a widely supported relational database.
For even better compression use a preset dictionary as well tuned to a wide sample of HTML, but it doesn't sound like you need to go that far.