Hacker News new | ask | show | jobs
by MichaelMoser123 1133 days ago
Data denormalization makes sense if the data is written once and never updated - like with a data warehouse / analytics.

If you need to update the data then denormalization can turn into a big source of trouble. For example you end up with many copies of the same stuff, and you must make an extra effort to update all the duplicates upon update.

Or you end up with multiple entries, where the validity of an entry is determined by some extra 'isValid == true' or 'deleted == false' field. Now all these 'invalid' entries then start to clog up the table/collection, and performance may quitely deteriorate.

I once had to use a denormalized schema for nested data, as lookup through too many reference would have suffered. But that wasn't funny at all.

2 comments

I'd argue that most of the time it is better to write normalized data and use some form of permanently existing database views (rather than actual table) to read from pseudo-denormalized data. That way you combine the best of both worlds.
That really, really depends on whether you’re driving something that gets read often enough that query complexity ruins your product’s ability to predictably deliver on query deadlines — whether that’s “load a webpage” deadline or a “submit half a million payments to the bank before it closes” deadline.
I've started doing this (materialized views denormalizing data) at work more recently and it's been immensely helpful. Surprised more people don't do this.
Data denormalization also helps with restoring individual tables and sharding. IMO one should aim for normalization and slowly denormalize only if needed.