Hacker News new | ask | show | jobs
by ajxs 2261 days ago
> "With denormalizing, data integrity is more of an application concern. You'll need to consider when this duplicated data can change and how to update it if needed. But this denormalization will give you a greater scale than is possible with other databases."

There's the big catch. As another poster pointed out, normalisation is not about efficiency. It's about correctness. People have been quick to make the comparison between storage and compute cost. The high cost of development and bug-fixing time trumps both of them by an order of magnitude. The guarantee of referential-integrity alone that SQL offers helps eradicate an entire class of bugs for your application with no added effort. This article glosses so blithely over this critical caveat. Whenever this discussion comes up I'm quick to refer back to the yardstick of "Does your application have users? If so, then its data is relational". I can't wait for the day when we look back at NoSQL as the 'dancing sickness' of the IT world.

It's also worth questioning: 'At what scale does this tradeoff become worthwhile?' Another poster here correctly pointed out that modern versions of Postgres scale remarkably well. The tipping point where this kind of NoSQL implementation becomes the most efficient option is likely to be far beyond the scale of most products. It's true that completely denormalising your data will make reads much faster, this is undeniable. This does not mean you need to throw the baby out with the bathwater and store your master data in NoSQL.

1 comments

> I can't wait for the day when we look back at NoSQL as the 'dancing sickness' of the IT world.

It does have some compelling use cases. It’s just relational data isn’t one of them. If you have a use case with a low potential for write contention, a tolerance for eventual consistency, a very simple data structure, and a high demand for read throughput, then it’s great. One area that I’ve seen it used with great success is content publishing. You have one author, perhaps an additional editor/proofreader, the content is one document (with perhaps one other related document, like an author bio), and hopefully you want thousands or perhaps millions of people to be able to get decent read performance. Another example could be pretty much anything you’d typically use a Materialized View for in a DB. You can compute the view in your RDBMS, periodically publish it to a document database, and then offload all read throughput to a better suited system.

NoSQL is usually used wrong imo, but that doesn’t mean there aren’t ways to use it right. There’s valid use cases for graph databases and stream processing systems too. But they’re not hip enough to produce the same volume of highly questionable web apps.

You're absolutely correct with the point about the materialised view use case. I wasn't going to labor the point going into extra detail in my post. The most successful use I've seen for NoSQL databases is aggregating complex relational data structures into a single document record periodically. You're entirely correct. It's not that the technology is inherently wrong ( in most cases, MongoDB is another story ) it's just the widespread misuse giving this technology a bad name.