| HN Mirror

I'm curious if you could use this not for data science tasks but for data engineering tasks - say read a csv or pull a table from oracle and store it as delta lake table or something.

I know its a boring use case, but the challenge with it is that it is a complete waste of money and carbon footprint to use Spark to process a 20 MB CSV or table with few thousand records, but tools like Pandas fall apart when you hit a 50 GB CSV or table with few billion records.

Something more efficient (say, in Rust and not Python or Java) and yet scalable (due to not fitting everything into memory) would be a great help here.