Hacker News new | ask | show | jobs
by FridgeSeal 2109 days ago
Even if you have an ETL pipeline to an OLAP database/data warehouse/etc, if your core database design is hostile to the analytics/etc then it's going to be a pain no matter how carefully they use it.

> it's really important to have a single owner for that database, or you'll never be able to evolve the schema...

IMO, the "owning" application/developers reserve the right to evolve the schema-and if that temporarily breaks ETL, then so be it, but the underlying schema itself shouldn't be hostile to analytics/etc.

1 comments

> Even if you have an ETL pipeline to an OLAP database/data warehouse/etc, if your core database design is hostile to the analytics/etc then it's going to be a pain no matter how carefully they use it.

Disagree. You don't need a single "core database design". It's fine to have different representations of your data for different purposes, and a transformation pipeline between them; that's the whole idea of CQRS etc.

Yes I'm not disagreeing there, I'm all for pipelines and CQRS and dedicated databases for dedicated purposes. The point I'm making is that if the original schema is a pain to work with, you can have as many pipelines and databases as you want, getting the actual data you want isn't any less of a pain.
> if the original schema is a pain to work with, you can have as many pipelines and databases as you want, getting the actual data you want isn't any less of a pain.

I don't think that's really true. If the original schema is just something you're ingesting before transforming then it doesn't really matter how bad it is; all you're gonna be doing is scanning over all the tables one way or another.