Hacker News new | ask | show | jobs
by wildengineer 2615 days ago
It sounds like your org needs two things,

1. A data warehouse for this data

2. Awareness of software/data best practices

That being said, while I agree code duplication is bad, data duplication isn't as long as you are maintaining data lineage. In some cases data duplication good.

I also wouldn't care too much that you have 100Gb max in a big data architecture. So what? It's not like you're going to be able to get rid of it easily. A data warehouse built from a new set of pipelines seems like the biggest bang for your buck.