| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rjeli 1896 days ago

I have been looking for exactly this for a while now:

“For my usecase, I wanted a way to:

Stash my data in its "original" JSON form Explore it later & build whatever views I want Keep costs & infrastructure complexity low Self-host it / own my data”

However, I think compression is a key part. Large scraped json datasets can be compressed by a few orders of magnitude, which is the difference between scp’ing a db up to aws in minutes vs days. SQLite has an official proprietary extension to DEFLATE backing pages, but that’s not easy to buy/distribute for hobby projects. I’ve tried row compression with zstd dictionaries and it works well, but then you lose native indexing.

Mongo wiredtiger does pretty much what I want, it’s just not a neat flat file :/

2 comments

adamlouis 1896 days ago

Nice! Glad it resonated. Never quite sure how a project like this will land.

Thanks for sharing those - will check them out. Interested to see what happens as the size of the dataset grows.

I have not looked deeply, but Typesense[1] seems like another interesting project. Similar to ES or Algolia, easy to self-host, & with a seemingly efficient memory & disk footprint.

[1]https://github.com/typesense/typesense

link

NicoJuicy 1896 days ago

Except the flat file part. What was lacking with postgress?

It has decent json support and v8 javascript build-in.

link

rjeli 1896 days ago

No compression of json docs. Jsonb is even bigger than json text column. There’s an extension out there but it requires manual compression passes and doesn’t allow indexing. Might as well use SQLite

link

mnahkies 1896 days ago

I think it would probably just make sense to run postgres on a compressed filesystem rather than use an extension.

Looks like citus had good results: https://www.citusdata.com/blog/2013/04/30/zfs-compression/

link

NicoJuicy 1896 days ago

Thanks for the informative answer!

link