| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cipherself 824 days ago
	Interesting, a while back at $EMPLOYER while working in a PoC, I found myself in a situation where I need to take user-provided Postgres SQL queries and run them against one of our APIs. Roughly, I converted the API response to a `pandas dataframe`, I then parsed and transformed the query from the postgres dialect to the duckdb dialect using `sqlglot` and used `duckdb` to query the `pandas dataframe`, coverted it into `json` and returned it to the user.

3 comments

e12e 823 days ago

Evidence does something similar - dumping data to parquet:

https://docs.evidence.dev/core-concepts/data-sources

https://news.ycombinator.com/item?id=35645464

I whish the implied ETL step was even clearer from the homepage - it's not really feasible for us to dump entire tables to the dev machines for working with production data - but it is an interesting concept.

link

anitil 823 days ago

That's a really creative way to solve that problem! I would have spun up a temporary pg instance etc, but this is much nicer.

link

neeleshs 823 days ago

That's interesting. I'm looking to do something similar, but need wire compatibility with postgresql. This allows any postgres client to talk to our service. Didn't have a lot of luck finding a good "middleware"

link

cipherself 823 days ago

I think this is probably the missing piece for you? https://github.com/jwills/buenavista

link

neeleshs 823 days ago

Thanks. This looks promising!

link

hantusk 823 days ago

you could consider hosting an empty postgresql database, compile your code as a postgresql foreign data wrapper and expose it as a view. Nothing is more compatible with the postgres wire protocol than postgresql itself ;)

turbot compiles their steampipe plugins in this way. Example: https://github.com/turbot/steampipe-plugin-net

link

neeleshs 823 days ago

That's a good idea and we considered FDW (for this and more stuff), but having a middleware makes it more flexible - FDW has limitations around pushdown with subselects, we are still constrained to a single postgresql instance for execution, when in theory we could parallelize (certain) queries across nodes.

link