|
|
|
|
|
by jagtesh
1510 days ago
|
|
First of all, thanks for sharing this OP! So glad to see a way to query a df using SQL without further transformation. Arrow has been truly revolutionary in this regard, providing a solid in-memory data format (with performant APIs in many languages) for interchange between different engines and even formats. You can go from ORC to Parset to CSV on a local FS or S3. With DuckDB, it’s like you can build your own AWS Athena at likely a fraction of the cost. Now if only someone would integrate vaex with DuckDB, it will make your powerful Apple Silicon machines a compelling alternative to running a full fledged Spark/Hadoop cluster. |
|
Isn't the whole purpose of Athena to scale to large amounts of data that don't fit into memory? How does duckdb fit in here? I thought it's an in-memory database?