| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by menaerus 1364 days ago

This is exciting but it left me wondering if the approach will remain to scale with larger TPC-H scale factors? Scale factor of 1 is honestly very small.

Also, I didn't quite understand if DuckDB in order to achieve this must:

1. Read the Postgres row formatted data

2. Transform the row formatted data into its internal columnar representation

3. Keep the representation in memory

1 comments

1egg0myegg0 1364 days ago

You can think of the attach operation as creating views in DuckDB with Postgres tables underneath! DuckDB will then query those Postgres rows (using the typical Postgres wire protocol, except in binary mode).

No data is persisted in DuckDB unless you do an insert statement with the result of the Postgres scan. DuckDB does process that data in a columnar fashion once it has been pulled into DuckDB memory though!

Does that help?

link

menaerus 1364 days ago

Yes, that's what I thought, thanks for an explanation.

What happens if the dataset size you want to post-process is let's say 1TB of size, or for that matter any size that is larger than the physical amount of memory available to DuckDB?

link