One interesting feature of DuckDB is that it can run queries against HTTP ranges of a static file hosted via HTTPS, and there's an official WebAssembly build of it that can do that same trick.
So you can dump e.g. all of Hacker News in a single multi-GB Parquet file somewhere and build a client-side JavaScript application that can run queries against that without having to fetch the whole thing.
DuckDB is an open-source column-oriented Relational Database Management System (RDBMS). It's designed to provide high performance on complex queries against large databases in embedded configuration.
"DICT FSST (Dictionary FSST) represents a hybrid compression technique that combines the benefits of Dictionary Encoding with the string-level compression capabilities of FSST.
This approach was implemented and integrated into DuckDB as part of ongoing efforts to optimize string storage and processing performance."
https://homepages.cwi.nl/~boncz/msc/2025-YanLannaAlexandre.p...
It is very similar to SQLite in that it can run in-process and store its data as a file.
It's different in that it is tailored to analytics, among other things storage is columnar, and it can run off some common data analytics file formats.
Hey jacquesm! No, I just forgot to make it public.
BUT I did try to push the entire 10GB of shards to GitHub (no LFS, no thanks, money), and after the 20 minutes compressing objects etc, "remote hang up unexpectedly"
To be expected I guess. I did not think GH Pages would be able to do this. So have been repeating:
Pretty neat project. I never thought you could do this in the first place, very much inspiring. I've made a little project that stores all of its data locally but still runs in the browser to protect against take downs and because I don't think you should store your precious data online more than you have to, eventually it all rots away. Your project takes this to the next level.
That's really cool, man. The music notation is beautiful. I hit play but couldn't get it to progress past the first note. Maybe I need to plug in a midi keyboard? It would be cool if I could "play" with my ASCII keyboard.
Listen was nice. That's really cool, actually. I encourage you to do it.
I was thinking more the numeric columns which have pre-built compression mechanisms to handle incrementing columns or long runs of identical values. For sure less total data than the text, but my prior is that the two should perform equivalently on the text, so the better compression on numbers should let duckdb pull ahead.
I had to run a test for myself, and using sqlite2duckdb (no research, first search hit), and using randomly picked shard 1636, the sqlite.gz was 4.9MB, but the duckdb.gz was 3.7MB.
The uncompressed sizes favor sqlite, which does not make sense to me, so not sure if duckdb keeps around more statistics information. Uncompressed sqlite 12.9MB, duckdb 15.5MB
Not the author here. I’m not sure about DuckDB, but SQLite allows you to simply use a file as a database and for archiving, it’s really helpful. One file, that’s it.
At a glance, that is missing (at least) a `parent` or `parent_id` attribute which items in HN can have (and you kind of need if you want to render comments), see http://hn.algolia.com/api/v1/items/46436741