|
|
|
|
|
by portInit
1209 days ago
|
|
Will absolutely reach out! Our experience has been that just getting data is often really challenging, so we've really focused on that piece, and being able to easily share with destinations that are purpose built for analytics, viz, etc. Thanks for checking it out! |
|
It would take some thinking and planning, and it's possibly not even a good idea ;) But generally any "data source" is packageable as an FDW as long as you can model it in such a way that you can reasonably implement certain functions for operations like table scans. For most FDWs, this is easy and the tradeoff of a large query is usually limited to excess bandwidth and latency while the query executor reads the result from the FDW. But with a live source pointing to a crawler instance, a table scan could in the worst case mean waiting for the crawler to parse the responses to hundreds of rate-limited network requests. So it's probably better to ingest the data once (and/or periodically) for a particular crul "table" (whatever you decide that means) rather than to query it live.
Fortunately, you can still write an FDW as the adapter layer, because Splitgraph ingests data on a schedule by querying the FDW of the live data source (while tolerating a long-running query). Alternatively (or additionally) you could write an Airbyte adapter which we also support, but only for ingestion - if you want live queryable tables then an FDW is necessary.
We've been interested in adding something like this (think Apify + Postgres) for a while. If done well it could be really cool. Let me know if you want to talk about it: miles@splitgraph.com