Hacker News new | ask | show | jobs
by lmeyerov 343 days ago
Yeah I'm happy to see this, we have been curious as part of figuring out cloud native storage extensions to GFQL (graph dataframe-native query lang), and my intuition was parquet was pluggable here... And this is the first I'm seeing a cogent writeup.

Likewise, this means, afaict, it's likewise pretty straightforward to do novel indexing schemes within Iceberg as well just by reusing this.

The other aspect I've been curious about is the happy path pluggable types for custom columns. This shows one way, but I'm unclear if same thing.

2 comments

We are actively working on supporting extension types. The mechanism is likely to be using the Arrow extension type mechanism (a logical annotation on top of existing Arrow types https://arrow.apache.org/docs/format/Columnar.html#format-me...)

I expect this to be used to support Variant https://github.com/apache/datafusion/issues/16116 and geometry types

(note I am an author)

I'm not sure if this is what you're looking for, but there is a proposal in DataFusion to allow user defined types. https://github.com/apache/datafusion/issues/12644
Thank you, looking forward to reading!