|
|
|
|
|
by arjunnarayan
1928 days ago
|
|
Thank you for your kind words! We indeed have plenty of work to be done (and are thus hiring)! I'm curious however why you think this requires you to be all-in on Materialize. As you said better than I could have, dbt is amazing at keeping your business logic organized. Our intention is very much for dbt to standardize the modeling/business logic layer which allows you to use multiple backends as you see fit in a way that shares the catalog layer cleanly. Our hope is that you have some BigQuery/Snowflake job that you're tired of running up the bill hitting redeploy 5 times a day, and you can cleanly port that over to Materialize with little work because the adapter is taking care of any small semantic differences in date handling, or null handling, etc. So Materialize sits cleanly side-by-side with Snowflake/BigQuery, and you're choosing whether you want things incrementally maintained with a few seconds of latency by Materialize, or once a day by the batch systems. My view is you're likely going to want to do data science with a batch system (when you're in "learning mode" you try and keep as many things fixed, including not updating the dataset), and then if the model becomes a critical automated pipeline, rather than rerunning the model every hour and uploading results to a Redis cache or something, you switch it over to Materialize, and don't have to every worry about cache invalidation. |
|
Then you could use that static store for exploration/fixed analysis or even initial development of dbt models for the Materialize layer, using the Snowflake or Spark connectors at first. When something's ready for production use, migrate it to your Materialize dbt project.
The way dbt currently works with backend switching (and the divergence of SQL dialects with respect to things like date functions and unstructured data), maintaining the batch and streaming layers side by side in dbt would be less wasteful than the current paradigm of completely separate tooling, but still a big source of overhead and synchronization errors.
If the community comes up with a good narrative for CI/CD and data testing in flight with the above, I don't think I'd even hesitate to pull the trigger on a migration. The best part is half of your potential customers already have their business logic in dbt.