| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Waterluvian 1201 days ago
	Mind you this isn’t appropriate for most cases. But I love the idea of “you start with text file. You end with text file. All the database stuff, indexes, etc. are just a detail.” Often I find that the database wants to be the authority and that makes working with different formats a bit uncomfortable.

2 comments

jmartin2683 1201 days ago

We’re currently building real-time apis backed by terabytes of compressed parquet… hundreds of billions of ‘rows’… in exactly this fashion using polars. It amazes us at every turn.

Join us and help!

link

ploppyploppy 1200 days ago

What project?

Do you mean polars reading Parquet into DuckDB to process that amount of data?

link

jmartin2683 1193 days ago

Internal. We're using Polars as the query engine to effectively query that data statically at rest (more accurately, mmap'd on disk in arrow ipc format)

link

truculent 1201 days ago

What does this look like in practice? Using the filesystem as a database?

link

masukomi 1201 days ago

GNU Recutils https://www.gnu.org/software/recutils/

is a good example of an actual database that uses plaintext files in your filesystem.

I can see the argument that doing this with JSON is better (or worse), but regardless, Recutils is an interesting idea that i wish more people knew about. I can imagine a lot of cool things emerging if people would iterate on the idea.

link

necrotic_comp 1201 days ago

Recutils is great, but it needs a rewrite, I think.

link

vlovich123 1201 days ago

Anything that stores data on a computer is essentially a database. It's all about representation and what kinds of operations you prioritize for performance.

link

mcdonje 1201 days ago

Apache Spark / Databricks is an example of this. Parquet files are stored in folders. A folder is assumed to hold one dataset split into multiple files based on specified partition criteria. The VMs read the necessary files into memory and then operate on it.

link

bobleeswagger 1201 days ago

Isn't linux a good example of this? Everything is a file.

link