|
|
|
|
|
by sspaeti
1260 days ago
|
|
The goal with the open data stack is that companies can reuse existing battle-tested solutions and build on top of them instead of reinventing the wheel by re-implementing key components from the Data Engineering Lifecycle for each component of the data stack. In the past, without these tools available, the story usually went something like this: - Extracting: “Write some script to extract data from X.”
- Visualizing: “Let’s buy an all-in-one BI tool.”
- Scheduling: "Now we need a daily cron."
- Monitoring: "Why didn't we know the script broke?"
- Configuration: "We need to reuse this code but slightly differently."
- Incremental Sync: "We only need the new data."
- Schema Change: "Now we have to rewrite this."
- Adding new sources: "OK, new script..."
- Testing + Auth + Pagination: "Why didn't we know the script broke?"
- Scaling: "How do we scale up and down this workload?" |
|