| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by chatmasta 1205 days ago

I'll make this same offer for Splitgraph :) If you feel like writing a Postgres FDW then we can add it to the engine on the backend, so that anyone with a Postgres client could connect to postgres://data.splitgraph.com:5432 and SELECT from a table backed by crul (either "mounted" for live querying, and/or ingested once/periodically for subsequent querying). The user just needs to provide parameters for the table; it's up to the FDW how to interpret those parameters.

It would take some thinking and planning, and it's possibly not even a good idea ;) But generally any "data source" is packageable as an FDW as long as you can model it in such a way that you can reasonably implement certain functions for operations like table scans. For most FDWs, this is easy and the tradeoff of a large query is usually limited to excess bandwidth and latency while the query executor reads the result from the FDW. But with a live source pointing to a crawler instance, a table scan could in the worst case mean waiting for the crawler to parse the responses to hundreds of rate-limited network requests. So it's probably better to ingest the data once (and/or periodically) for a particular crul "table" (whatever you decide that means) rather than to query it live.

Fortunately, you can still write an FDW as the adapter layer, because Splitgraph ingests data on a schedule by querying the FDW of the live data source (while tolerating a long-running query). Alternatively (or additionally) you could write an Airbyte adapter which we also support, but only for ingestion - if you want live queryable tables then an FDW is necessary.

We've been interested in adding something like this (think Apify + Postgres) for a while. If done well it could be really cool. Let me know if you want to talk about it: miles@splitgraph.com

2 comments

dan_rock_wilson 1205 days ago

How did you integrate with all those services (the third party service APIs, not asking about the DBs)? I've seen a bunch of sites do this and I'm curious if there's some open source library I should be using, so that I don't need to write from scratch each time I'd like to integrate with another service.

Edit: just saw "airbyte" in a bunch of places, which I assume answers my question. So updated question: airbyte works well for ya?

link

portInit 1205 days ago

We hand wrote a number of integrations, sometimes it was a simple as reusing a schema with slightly different values, we are also using the awesome https://www.benthos.dev/!

link

dan_rock_wilson 1205 days ago

Thanks for the info, Benthos has not been on my radar, will check it out.

link

portInit 1205 days ago

We'll have to see how close our current postgres integration is! Would like to understand more and will reach out.

link

chatmasta 1205 days ago

Awesome, looking forward to it.

I just got to the section on destinations ("Stores"). Very cool. If you're building an Enterprise plan where you manage the infrastructure for your customers, we can deploy a dedicated white-labeled deployment of Splitgraph Cloud to any of the three major clouds. Perhaps you could find a use for that in your backend infrastructure.

ETA: Also, if you can write to Postgres, you can write to the Splitgraph DDN [0] with most DML and DDL statements, including INSERT, and CREATE TABLE. So even without any FDW, you might be able to add Splitgraph as a "destination" for your users (who would for the most part just need to provide an API keypair).

[0] https://www.splitgraph.com/docs/add-data/from-ddn

link