|
|
|
|
|
by cjonas
82 days ago
|
|
We just create mini data "ponds" on the fly by copying tenant isolated gold tier data to parquet in s3. The users/agent queries are executed with duckdb. We run this process when the user start a session and generate an STS token scoped to their tenant bucket path. Its extremely simple and works well (at least with our data volumes). |
|
I didn't use the in browser WASM but I did expose an api endpoint that passed data exploration queries directly to the backend like a knock off of what new relic does. I also use that same endpoint for all the graphs and metrics in the UI. Just filtered out the write / delete statements in a rudimentary way.
DuckDB is phenomenal tech and I love to use it with data ponds instead of data lakes although it is very capable of large sets as well.
And "data pond"? Glad I am not alone using this term! Somewhere between a data lake and warehouse - still unstructured but not _everything_ in one place. For instance, if I have a multi-tenant app I might choose to have a duckdb setup for each customer with pre-filtered data living alongside some global unstructured data.
Maybe there's already a term that covers this but I like the imagery of the metaphor... "smaller, multiple data but same idea as the big one".