Hacker News new | ask | show | jobs
by dnamlin 1783 days ago
I plowed the 40 GiB database file into sqlite_zstd_vfs [1] which reduced it 75% to 10 GiB. This plug-in also supports HTTP access [2] in the spirit of phiresky's, however, I don't have a WebAssembly build of mine yet, so it's a library for desktop/command-line apps for now. You can try it out on Linux or macOS x86-64:

  pip3 install genomicsqlite
  genomicsqlite https://f000.backblazeb2.com/file/mlin-public/static.wiki/en.zstd.db "select text from wiki_articles where title = 'SQLite'"
("genomicsqlite" is the CLI for my Genomics Extension [3], which is built around these Zstandard compression & web layers.)

[1] https://github.com/mlin/sqlite_zstd_vfs

[2] https://github.com/mlin/sqlite_web_vfs

[3] https://mlin.github.io/GenomicSQLite

EDITS: I expanded on this comment in this gist https://gist.github.com/mlin/ee20d7c5156baf9b12518961f36590c...

If you want to download the whole en.zstd.db, then please kindly get it from zenodo (which doesn't support HTTP range requests, but is free): https://zenodo.org/record/5149677

1 comments

Great work but why not compress with deflate if you are serving http requests since then you could directly copy the database content to the wire as gzip encoded responses.
With sqlite_zstd_vfs the data are compressed beforehand and stored that way "at rest", so web responses are directly copied to the wire controlled by HTTP range headers, similarly to the OP. They need to be decompressed by a client library sitting between SQLite and the wire.