Hacker News new | ask | show | jobs
by Gepsens 1700 days ago
It's not the same. Data fusion comes with ballista, with the goal of replacing spark for many usages. It also supports JSON and Avro
1 comments

Yes, it's not the same, but they serve the same purpose. It's, honestly, not important if Datafusion or DuckDB are using the arrow memory layout or not. What matters is their ability to run SQL queries (or Map-Reduce workloads) on CSV/Parquet files _WITHOUT COPYING THEM_.

If you start comparing them to solutions that copy datasets, you haven't understood what problem they are solving. For that problem, use postgresql or bigquery.

What have I not understood ? Please enlighten me.