|
|
|
|
|
by jamesblonde
906 days ago
|
|
I disagree with this strongly -
"The best way to store Apache Arrow dataframes in files on disk is with Feather. However, it’s also possible to convert to Apache Parquet format and others." The best way to build your own non-JVM lakehouse is to use Iceberg for metadata, Parquet for the Data, Query with DuckDB using Arrow tables (read Parquet directly into Arrow is very low cost), and then use Arrow->Pandas or Polars (either directly or via a service with Arrow Flight). If you put Feather in the mix, the whole Python lakehouse stack doesn't currently work. |
|