Hacker News new | ask | show | jobs
by menaerus 1364 days ago
This is exciting but it left me wondering if the approach will remain to scale with larger TPC-H scale factors? Scale factor of 1 is honestly very small.

Also, I didn't quite understand if DuckDB in order to achieve this must:

1. Read the Postgres row formatted data

2. Transform the row formatted data into its internal columnar representation

3. Keep the representation in memory

1 comments

You can think of the attach operation as creating views in DuckDB with Postgres tables underneath! DuckDB will then query those Postgres rows (using the typical Postgres wire protocol, except in binary mode).

No data is persisted in DuckDB unless you do an insert statement with the result of the Postgres scan. DuckDB does process that data in a columnar fashion once it has been pulled into DuckDB memory though!

Does that help?

Yes, that's what I thought, thanks for an explanation.

What happens if the dataset size you want to post-process is let's say 1TB of size, or for that matter any size that is larger than the physical amount of memory available to DuckDB?