|
|
|
|
|
by rjeli
1896 days ago
|
|
I have been looking for exactly this for a while now: “For my usecase, I wanted a way to: Stash my data in its "original" JSON form
Explore it later & build whatever views I want
Keep costs & infrastructure complexity low
Self-host it / own my data” However, I think compression is a key part. Large scraped json datasets can be compressed by a few orders of magnitude, which is the difference between scp’ing a db up to aws in minutes vs days. SQLite has an official proprietary extension to DEFLATE backing pages, but that’s not easy to buy/distribute for hobby projects. I’ve tried row compression with zstd dictionaries and it works well, but then you lose native indexing. Mongo wiredtiger does pretty much what I want, it’s just not a neat flat file :/ |
|
Thanks for sharing those - will check them out. Interested to see what happens as the size of the dataset grows.
I have not looked deeply, but Typesense[1] seems like another interesting project. Similar to ES or Algolia, easy to self-host, & with a seemingly efficient memory & disk footprint.
[1]https://github.com/typesense/typesense