| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sspaeti 1260 days ago

The goal with the open data stack is that companies can reuse existing battle-tested solutions and build on top of them instead of reinventing the wheel by re-implementing key components from the Data Engineering Lifecycle for each component of the data stack.

In the past, without these tools available, the story usually went something like this:

- Extracting: “Write some script to extract data from X.” - Visualizing: “Let’s buy an all-in-one BI tool.” - Scheduling: "Now we need a daily cron." - Monitoring: "Why didn't we know the script broke?" - Configuration: "We need to reuse this code but slightly differently." - Incremental Sync: "We only need the new data." - Schema Change: "Now we have to rewrite this." - Adding new sources: "OK, new script..." - Testing + Auth + Pagination: "Why didn't we know the script broke?" - Scaling: "How do we scale up and down this workload?"