|
|
|
|
|
by sherifnada
1334 days ago
|
|
In some sense, Data engineering today is where software engineering was a decade ago: - Infrastructure as code is not the norm. Most tools are UI-focused. It's the equivalent of setting up your infra via the AWS UI. - Prod/Staging/Dev environments are not the norm - Version Control is not a first class concept - DRY and component re-use is exceedingly difficult (how many times did you walk into a meeting where 3 people had 3 different definitions of the same metric?) - API Interfaces are rarely explicitly defined, and fickle when they are (the hot name for this nowadays is "data contracts") - unit/integration/acceptance testing is not as nearly as ubiquitous as it is in software On the bright side, I think this means DE doesn't need to re-invent the wheel on a lot of these issues. We can borrow a lot from software engineering. |
|
Unit testing is the only thing we tend to skip, mainly because it's more reliable to allow for fluidity in the data that's being ingested. Which is really easy now that so many databases can support automatic schema detection. External APIs can change without notice, so it's better to just design for that, then use the time you would spend on unit tests to build alerts around automated data validation.