Hacker News new | ask | show | jobs
by oa335 207 days ago
Vortex is a file format, where as delta lake and iceberg are table formats. it should be compared to Parquet rather than delta lake and iceberg. This guest lecture by a maintainer of Vortex provides a good overview of the file format, motivations for its creation and its key features.

https://www.youtube.com/watch?v=zyn_T5uragA

2 comments

The website could use a comparison / motivation in comparison to Parquet (beyond just stating it's 100x better).
Agreed, really need a tl;dr here, because Parquet is boring technology. Going to require quite the sales pitch to move. At minimum, I assume it will be years before I could expect native integration in pandas/polars/etc which would make it low effort enough to consider.

Parquet is ..fine, I guess. It is good enough. Why invoke churn? Sell me on the vision.

DuckDB just added support for vortex in their last release using the Vortex Python package so hopefully other tools wont be too far behind
> Going to require quite the sales pitch to move.

Mutability would be one such pitch I would like to see ...

I think it would still make sense to compare with those table formats, or is the idea that you would only use this if you could not use a table format?
That’s like comparing words with characters.

Vortex is, roughly, how you save data to files and Iceberg is the database-like manager of those files. You’ll soon be able to run Iceberg using Vortex because they are complementary, not competing, technologies.