|
|
|
|
|
by aaronsteers
959 days ago
|
|
Thanks for this feedback! I do agree there are some similarities as I called our as common benefits of using "EL pairs" on both sides of the process. Here are my thoughts though on the importance of the distinction... The first place you land the data is almost always a place you control - either a data warehouse or a data lake that you have tuned for fast and flexible data processing. The second (publish) process pushes to a location you most likely can't control, and which is not prepared to receive raw/unshaped data. This is important because the business logic in our transformations will almost always evolve over time. Running between EL and P (the second "EL") gives us reproducibility and efficiency to innovate, using the location we have the best performance profile for running those transforms. What do you think? |
|
I'm not convinced the distinction is important enough to warrant anything other than bucketing it under Reverse ETL, and the terms introduced (ELTP and "EL Pairs") I think create less clarity, not more.
> pushes to a location you most likely can't control
Even for internal data hand-offs, this is usually the case. Unless the same team is doing both the ETL work and building the app that's using the output, then the data team is delivering something that was signed-off by the receiving team.
> not prepared to receive raw/unshaped data
So like all Reverse ETL, which requires some sort of integration boundary for data delivery. That could be an API, or a CSV file uploaded to an FTP server, or reading schema'd JSON from Kafka. In every instance, the data team needs to tailor the output specific to the receiver.