|
|
|
|
|
by ignoramous
1330 days ago
|
|
> We're heavily using it to build Seafowl, an analytical database that's optimized for running SQL queries directly from the user's browser... Interesting. Where does seafowl fit in when I compare it with, say, data-stack-in-a-box approach, for ex: meltano + dbt + duckdb + superset [0]? Is my thinking right that seafowl possibly replaces both duckdb (with IOx) and superset (if there's a web front-end)? Incidentally, dagster had an article up just yesterday making a case for poor-man's datalake with dbt + dagster + duckdb [1]. What does splitgraph replace if I were to use it in a similar setup? Thanks. [0] https://archive.is/DxU1e [1] https://archive.is/5ikU4 |
|
It's a fairly different use case from DuckDB (query execution for Web applications vs fast embedded analytical database for notebooks) and the rest of the modern data stack (which mostly is about analytics internal to a company). Just to clarify, we're not related to IOx directly (only via us both using Apache DataFusion).
If we had to place Seafowl _inside_ of the modern data stack, it'd be mostly a warehouse, but one that is optimized for being queried from the Internet, rather than by a limited set of internal users. Or, a potential use case could be extracting internal data from your warehouse to Seafowl in order to build public applications that use it.
We don't currently ship a Web front-end and so can't serve as a replacement to Superset: it's exposed to the developer as an HTTP API that can be queried directly from the end user's Web browser. But we have some ideas around a frontend component: some kind of a middleware, where the Web app can pre-declare the queries it will need to run at build time and we can compute some pre-aggregations to speed those up at runtime. Currently we recommend querying it with Observable [0] for an end-to-end query + visualization experience (or use a different viz library like d3/Vega).
Re: the second question about Splitgraph for a data lake, the intention behind Splitgraph is to orchestrate all those tools and there the use case is indeed the modern data stack in a box. It's kind of similar to dbt Labs's Sinter [1] which was supposed to be the end-to-end data platform before they focused on dbt and dbt Cloud instead: being able to run Airbyte ingestion, dbt transformations, be a data warehouse (using PostgreSQL and a columnar store extension), let users organize and discover data at the same time. There's a lot of baggage in Splitgraph though, as we moved through a few iterations of the product (first Git/Docker for data, then a platform for the modern data stack). Currently we're thinking about how to best integrate Splitgraph and Seafowl in order to build a managed pay-as-you-go Seafowl, kind of like Fauna [2] for analytics.
Hope this helps!
[0] https://observablehq.com/@seafowl/interactive-visualization-...
[1] https://www.getdbt.com/blog/whats-in-a-name/
[2] https://fauna.com/