|
|
|
|
|
by FridgeSeal
1692 days ago
|
|
Arrow is a language-independent memory layout. It’s designed so that you could stream memory from (for example) a Rust data source to a spark/DataFusion/Python/whatever else/etc with faster throughout and support for zero-copy reads, and no serialisation/deserialisation overhead. Having the same memory model ensures better type and layout consistency as well, and means that query engines can get on with optimising and running queries rather than also having to worry about IO optimisations as well. I’m using DataFusion (via Rust) and it’s pretty fantastic. Would love to swap out some Spark stuff for DataFusion/Ballista stuff at some point as well. |
|