Hacker News new | ask | show | jobs
by cube2222 1201 days ago
This is really cool!

With their Postgres scanner[0] you can now easily query multiple datasources using SQL and join between them (i.e. Postgres table with JSON file). Something I previously strived to build with OctoSQL[1]. There's even predicate push-down to the underlying databases (for Postgres)!

It's amazing to see how quickly DuckDB is adding new features.

Not a huge fan of C++, which is right now used for authoring extensions, it'd be really cool if somebody implemented a Rust extension SDK, or even something like Steampipe[2] does for Postgres FDWs which would provide a shim for quickly implementing non-performance-sensitive extensions for various things.

Godspeed!

[0]: https://duckdb.org/2022/09/30/postgres-scanner.html

[1]: https://github.com/cube2222/octosql

[2]: https://steampipe.io

4 comments

To answer myself, I've found a project which enables extension development for DuckDB using Rust[0].

[0]: https://github.com/Mause/duckdb-extension-framework

Consider polars for rust. Much, much faster with fewer resources than Duckdb or datafusion in my experience.
Polars is a dataframe library, no? That's a quite different use-case.
oh no.. it does lazy query optimization, out of core... most if not all of the good stuff.
It's not a CLI SQL engine though, is it?
oh no.. for that you'd want datafusion
After having tried their PostgreSQL plugin I feel like it's a bit too early to use in production.

Very little amount and unclear pushdown filters are one of the issues, not handling certain data types and thus not being able to scan the table (even if the column in question isn't used) is another.

I think that DuckDB is also missing a PostgreSQL logical replication driver to continuously replicate a subset of tables you want to run stats on.

Syncing the full table every time is too slow.

It's a very exciting time to be working in this space! Going beyond structured databases and file formats like JSON/CSV, there are also systems that can query APIs, source code, ML models, etc.

My own Trustfall query engine is one of them: https://github.com/obi1kenobi/trustfall

For example, you can query the HackerNews APIs from your browser: "Which Twitter/GitHub users comment on stories about OpenAI?" https://play.predr.ag/hackernews#?f=1&q=IyBDcm9zcyBBUEkgcXVl...

One of its real-world use cases is at the core a Rust semver linter: https://predr.ag/blog/speeding-up-rust-semver-checking-by-ov...