| HN Mirror

It's not always clear which pushdowns are available in DuckDB. For instance, while x = y has been available for a while, x in (y, z, ...) hasn't. The DuckDB team seems very eager and motivated to get all the pushdown functionality working though, so hopefully becomes a non-issue soon (perhaps already in 0.10.0).

Another way to use DuckDB if you're warehouse supports it would be e2e Arrow (no col->row->col overhead).

    warehouse --> ADBC --> arrow --> DuckDB

Of course this differs in that you would be reading from the warehouse directly, but in my experience fully pre-aggregating data and then staging it (keeping parquet files on S3 up to date) might solve one issue, but results in unimaginable issues. Perhaps the sweet spot might be something like Iceberg in the middle?

    warehouse --> iceberg table (parquet) --> DuckDB

> craft corresponding data loaders

Do you have an example of this using DuckDB? I'm very interested in seeing an actual implementation of Observable's data loaders combined with DuckDB (or any other SQL DB)

edit: nm, found it. https://observablehq.com/framework/lib/duckdb

edit2: eh, I didn't even realize who i was responding to, lol. The more I read into this, the more I see how this is all heavily based on static files. So the static parquet files thing makes more sense, the solution I added makes little. Although I guess you could add a static iceberg table and interact with its manifest with the duckdb iceberg extension.