|
|
|
|
|
by denimnerd42
601 days ago
|
|
I lead a team on a large data project at an enormous bank, hundreds of devs on the project across 3 continents. My team took care of the integration and automation of the sdlc process. We moved from several generations of ETL applications (9 applications) netezza/teradata/mainframes/hive map reduce all to spark. The project was a huge cost savings and great success. Massive risk reduction by getting these systems all under 1 roof. We found a lot of issues with the original data. We automated the lineage generation, data quality, data integrity, etc. We developed a frame work that made everything batteries included. Transformations were done as linear set of SQL steps or a DAG of sql steps if required. You could do more complicated things in reusable plugins if needed. We had a rock solid old school scheduler application also. We had thousands of these jobs. We had an automated data comparison tool that cataloged old data and ran the original code vs the new code on the same set of data. I don't think it's impossible to pull off but it was a hard project for sure. Grew my career a ton. |
|
I know startups that hired data engineers, deployed warehouses,DBT, a BI tool and churned hundreds of reports, and in one case their DBT project has hundreds of files. No one in that company knew why any of it was used.
All said and done the business users wanted three reports.
More often than not data teams are self-serving than anything else.