Hacker News new | ask | show | jobs
by 1egg0myegg0 722 days ago
Howdy! I work at MotherDuck and DuckDB Labs (part time as a blogger). At MotherDuck, we have both client side and server side compute! So the initial reduction from PB/TB to GB/MB can happen server side, and the results can be sliced and diced at top speed in your browser!
2 comments

Does duckdb work with delta files?
Please spend a sentence or two explaining the server side filtering mechanism and linking to documentation! I would like to know the conditions required for streaming queries! From the sibling comment and a search of the docs it seems like this is a Parquet only feature, which seems pretty important to note!
Parquet is designed with predicate push-down in mind. Partitions are laid out on disk, and then blocks within files are laid out so that consumers can very, very easily narrow in on which files they need to read, before doing anymore IO than a list, or a small metadata read.

Once you know what you are reading, many parquet/arrow libraries will support streaming reads/aggregations, so the client doesn’t need to load the whole working set in memory.

This only covers very simple min / max / sum cases.

For all others you'll need to download all columns you are filtering or selecting.

Not specific to Ducks but S3 select https://docs.aws.amazon.com/AmazonS3/latest/userguide/select... can filter Parquet server side on S3 and is supported by some other object stores.