Hacker News new | ask | show | jobs
by cipherself 824 days ago
Interesting, a while back at $EMPLOYER while working in a PoC, I found myself in a situation where I need to take user-provided Postgres SQL queries and run them against one of our APIs. Roughly, I converted the API response to a `pandas dataframe`, I then parsed and transformed the query from the postgres dialect to the duckdb dialect using `sqlglot` and used `duckdb` to query the `pandas dataframe`, coverted it into `json` and returned it to the user.
3 comments

Evidence does something similar - dumping data to parquet:

https://docs.evidence.dev/core-concepts/data-sources

https://news.ycombinator.com/item?id=35645464

I whish the implied ETL step was even clearer from the homepage - it's not really feasible for us to dump entire tables to the dev machines for working with production data - but it is an interesting concept.

That's a really creative way to solve that problem! I would have spun up a temporary pg instance etc, but this is much nicer.
That's interesting. I'm looking to do something similar, but need wire compatibility with postgresql. This allows any postgres client to talk to our service. Didn't have a lot of luck finding a good "middleware"
I think this is probably the missing piece for you? https://github.com/jwills/buenavista
Thanks. This looks promising!
you could consider hosting an empty postgresql database, compile your code as a postgresql foreign data wrapper and expose it as a view. Nothing is more compatible with the postgres wire protocol than postgresql itself ;)

turbot compiles their steampipe plugins in this way. Example: https://github.com/turbot/steampipe-plugin-net

That's a good idea and we considered FDW (for this and more stuff), but having a middleware makes it more flexible - FDW has limitations around pushdown with subselects, we are still constrained to a single postgresql instance for execution, when in theory we could parallelize (certain) queries across nodes.