Perhaps we should clean up the wording in the intro, but yes there is in fact a file format!
We actually built the toolkit first, before building the file format. The interesting thing here is that we have a consistent in-memory and on-disk representation of compressed, typed arrays.
This is nice for a couple of reasons:
(a) It makes it really easy to test out new compression algorithms and compute functions. We just implement a new codec and it's automatically available for the file format.
(b) We spend a lot of energy on efficient push down. Many compute functions such as slicing and cloning are zero-cost, and all compute operations can execute directly over compressed data.
Highly encourage you to checkout the vortex-serde crate in the repo for file format things, and the vortex-datafusion crate for some examples of integrating the format into a query engine!
Perhaps we should clean up the wording in the intro, but yes there is in fact a file format!
We actually built the toolkit first, before building the file format. The interesting thing here is that we have a consistent in-memory and on-disk representation of compressed, typed arrays.
This is nice for a couple of reasons:
(a) It makes it really easy to test out new compression algorithms and compute functions. We just implement a new codec and it's automatically available for the file format.
(b) We spend a lot of energy on efficient push down. Many compute functions such as slicing and cloning are zero-cost, and all compute operations can execute directly over compressed data.
Highly encourage you to checkout the vortex-serde crate in the repo for file format things, and the vortex-datafusion crate for some examples of integrating the format into a query engine!