I'm routinely confused about the role these sort of tools play. I conceptually understand what they're doing but weren't these problems solved a long time ago with materialized views and foreign data wrappers?
The hard part of ETL for me has always been gracefully handling the outliers, the data that you need to look at within the context of multiple rows (e.g. duplicate rows that aren't exactly the same so it's not a simple SELECT DISTINCT) in order to make the correct decision, or the entity matching/mapping/categorization that often is necessary. Having the lookup tables that often need manual oversight when new entries show up that haven't yet been properly tagged. Or if you're ingesting address data and you want to normalize it through a geocoder, where exactly does that occur in these SQL-templating "pipelines"?
I feel like these are basically focused on moving data between different DBs and generating group-by queries to populate some rollup tables?
Indeed. Dataform however only does the T (transformation) in ELT. Most businesses are centralising an exploding diversity of raw data in the data warehouse and Dataform helps them manage the data there.
One other main difference with a product like composable is that Dataform is built with software engineering best practices in mind. The transformations are written in code (SQL) rather than a GUI. It enables version control, reusable components for example and makes it easier to manage a large number of interdependent transformations.
The hard part of ETL for me has always been gracefully handling the outliers, the data that you need to look at within the context of multiple rows (e.g. duplicate rows that aren't exactly the same so it's not a simple SELECT DISTINCT) in order to make the correct decision, or the entity matching/mapping/categorization that often is necessary. Having the lookup tables that often need manual oversight when new entries show up that haven't yet been properly tagged. Or if you're ingesting address data and you want to normalize it through a geocoder, where exactly does that occur in these SQL-templating "pipelines"?
I feel like these are basically focused on moving data between different DBs and generating group-by queries to populate some rollup tables?