Hacker News new | ask | show | jobs
by atonse 1289 days ago
This is wonderful!

I love the pragmatism behind tools like osquery and steampipe that expose a lot of APIs as database tables. It makes these datasets available to non-programmers that are more comfortable with a database/tabular format.

Is it fair to say though, that FDWs have to run as compiled shared libs? I've always wondered if there can be (like with VS code and language servers) a protocol where we can run a generic FDW that speaks a particular API over the network, and then you can build out-of-process connectors to it and just have to know the usual tools (HTTP, JSON, etc).

Thoughts? Maybe this already exists.

Then anyone could potentially build a bespoke API/dataset and just point an FDW to it.

2 comments

> speaks a particular API over the network

it's a interesting idea, and one of the things that we were toying with in our pg_net extension (https://github.com/supabase/pg_net). This is a "generic" async network extension, so you can fetch/put/post. It works well for APIs.

I think the generic approach works for some things where the data is less "fixed" - for example, an OpenAI API endpoint.

But for "fixed" data (data warehouses), the wrapper usually needs some custom work for security, protocols, and "push down". I'll be interested to get HN's take on this - they might have some suggestions for us for this framework

I don't think so... in your example those are functions we just call in SQL right? I'm talking about exposing arbitrary data as a table like an FDW, but without having to compile a whole module.

So more like, (this is a bit contrived but bear with me):

Let's say I wanted to make a table that lists all my keybase proofs (so sad Keybase is basically dead, but I digress).

I'd build a graphql endpoint where you can query those proofs.

Then I'd have to define a metadata endpoint that maybe returns a JSON schema?

Then you can tell the FDW to define a virtual table that (as much as possible) uses the JSON schema to generate columns in this virtual table.

You'd define the FDW to say, here's the virtual table (like you'd do now), and then put in an option to say "you'll get the mappings for this table with the JSON schema returned by this metadata endpoint" and then tell the FDW "send select queries to this REST/graphql endpoint" which would be something in our example that returns keybase proofs).

I wouldn't need to build a whole new keybase FDW. I'd just use the generic one to hit that existing endpoint.

Is that making sense? It's kind of a dynamic FDW that can be configured for the long tail of API endpoints.

Yes that makes sense.

> define a metadata endpoint that maybe returns a JSON schema

We could use an OpenAPI spec[0], which many REST services expose as a JSON endpoint. I think that could work: develop a generic "OpenAPI" FDW, which you can connect like this:

    create server example_api
      foreign data wrapper example_api_wrapper
      options (
        open_api_url 'https://api.example.com/'
      );
Then we'd map:

    GET -> SELECT (which would map selected columns into a table)
    POST -> INSERT
    PATCH -> UPDATE
    DELETE -> DELETE

[0] OpenAPI: https://swagger.io/specification/
Right!

I’m wondering if this is how multicorn works even now. It seems to be an out of process thing. I wasn’t aware of multicorn before.

But python… wish it was elixir :-)