| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 1egg0myegg0 722 days ago
	Howdy! I work at MotherDuck and DuckDB Labs (part time as a blogger). At MotherDuck, we have both client side and server side compute! So the initial reduction from PB/TB to GB/MB can happen server side, and the results can be sliced and diced at top speed in your browser!

2 comments

victor106 719 days ago

Does duckdb work with delta files?

link

code_biologist 719 days ago

Please spend a sentence or two explaining the server side filtering mechanism and linking to documentation! I would like to know the conditions required for streaming queries! From the sibling comment and a search of the docs it seems like this is a Parquet only feature, which seems pretty important to note!

link

FridgeSeal 719 days ago

Parquet is designed with predicate push-down in mind. Partitions are laid out on disk, and then blocks within files are laid out so that consumers can very, very easily narrow in on which files they need to read, before doing anymore IO than a list, or a small metadata read.

Once you know what you are reading, many parquet/arrow libraries will support streaming reads/aggregations, so the client doesn’t need to load the whole working set in memory.

link

LunaSea 719 days ago

This only covers very simple min / max / sum cases.

For all others you'll need to download all columns you are filtering or selecting.

link

justincormack 719 days ago

Not specific to Ducks but S3 select https://docs.aws.amazon.com/AmazonS3/latest/userguide/select... can filter Parquet server side on S3 and is supported by some other object stores.

link