Hacker News new | ask | show | jobs
by polskibus 1666 days ago
How does this compare to Postgres + parquet FDW? Is zero copy feasible in Postgres with FDWs?
3 comments

DuckDB is an embedded database. That is, it is a library one loads into their application. This is in contrast to a server database like Postgres where the database and application live on different processes (usually on different machines).

DuckDB is often compared to SQLite. But a more apt comparison might be KDB+, a proprietary vector embedded database.

I am not very familiar with the Postgres Parquet FDW, but here is an educated guess!

Postgres is a row store engine rather than a column store, so I believe there will need to be quite a lot of translation for Postgres to be able to process parquet data (DuckDB and parquet are both columnar). My hypothesis is that DuckDB would be significantly faster! However, feel free to benchmark things!

I should also add that there is a duckdb fdw, so you could have DuckDB read from your parquet files and do faster transformations before you pull your data into Postgres!

https://github.com/alitrack/duckdb_fdw

This is crazy, have you measured that parquet + duckdb + postgresql is faster than parquet + postgresql?
This may be close to what you're thinking: https://turbodbc.readthedocs.io/en/latest/pages/advanced_usa...

We saw speedups (20%+?), but wrong orders of magnitude of perf that we advocate DB vendors to aim for when we do visual analytics integrations. Arrow opens up saturating networks & PCI cards for DB<>GPU, so think going for 10-50GB/s.