Hacker News new | ask | show | jobs
by Panzerschrek 5 hours ago
If DuckDB is so fast and has no data transfer overheads, does it need all this typical SQL machinery with filtering and joining via SELECT queries? Wouldn't it be simpler and faster to return all data to the caller code (all table rows, but only requested columns) and let it perform all other necessary data processing logic?
2 comments

You’d end up implementing your own home grown version of hash join and query pushdown (skipping parquet row groups entirely) etc and your own home grown heuristics in selecting the right one (planning)

Which can outperform a generic solution like this of course, but it’s not less work to make faster for most cases.

Also duckdb can give you access to an in memory representation (e.g. `fetch_arrow_table()`) so you have less “language data structure wrapping” overhead. And you can do filtering yourself on that. In most cases the “where” statements will win though.

The SELECT machinery is the product with databases! SQL often the shortest description of the processing logic, and the database has an efficient local execution engine that can prune/reduce data read based on the plan. Very hard to match in app, especially when joins get involved.