|
|
|
|
|
by mildbyte
739 days ago
|
|
Another difference is that this solution uses parquet_fdw, which handles fast scans through Parquet files and filter pushdown via row group pruning, but doesn't vectorize the groupby / join operations above the table scan in the query tree (so you're still using the row-by-row PG query executor in the end). pg_analytics uses DataFusion (dedicated analytical query engine) to run the entire query, which can achieve orders of magnitude speedups over vanilla PG with indexes on analytical benchmarks like TPC-H. We use the same approach at EDB for our Postgres Lakehouse (I'm part of the team that works on it). |
|
My initial intent was to use duckdb for fast vectored query execution but I wasn't able to create a planner / execution hook that uses duckdb internally. Will definitely checkout pg_analytics / Datafusion to see if the same can be integrated here as well. Thanks for the pointers.