| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by milicevica23 920 days ago

Hi, Aleks here, one of the authors, and thank you very much for your comment

We run this stack in production for the last few months, and it has its downsides (I would argue due to the young ecosystem) and upsides, which we try to explain in the blog. We wanted to concentrate more on the concepts and technology change/improvement that allows us to run such a stack and explain how we see the future steps forward.

1. Running a warehouse is not a bad idea, but you must always be careful to separate the storage from the compute to scale. I experienced the limitation of such a system as described in this blog https://delta.io/blog/2022-09-14-why-migrate-lakehouse-delta... -> tough position if your solution is good but not scale. Ideally run it with external tables in order that data is visible without engine access

Another limitation is the metastore for your tables and metadata, which you usually have per workspace/environment in such a scenario. Databricks' unity catalog is an excellent way to solve it, but it is only compatible with some engines.

2. We do not think that this stack exists to exchange the snowflake or big query but to take a part of the workload away ( data transformation) and let PaaS solutions be good at what they are made for -> user interface and interaction.