Hacker News new | ask | show | jobs
by j_baker 5341 days ago
I'm calling troll.

Having a serious discussion about NoSQL databases begs the exact same question as having a serious discussion about cancer: what kind would you like to have a serious discussion about?

I think the most important lesson we can learn from NoSQL in general is that the idea of a one-size-fits-all database is becoming dated. NoSQL databases certainly don't solve the problems the author points out, and they probably never will. In fact that's the point. By not solving one set of problems, you allow yourself to solve another set of problems.

How about we use databases to solve the problems they were meant to solve, rather than basing our choices on whatever the popular opinion is at the moment.

1 comments

"I think the most important lesson we can learn from NoSQL in general is that the idea of a one-size-fits-all database is becoming dated."

For programming languages, using the "right tool for the job" has little downside. Perhaps the developers need to learn an extra language, or perhaps there is some communication overhead between them. But unless the components are tightly-coupled, there's not much of a loss.

In contrast, the value of the whole data is greater than the sum of the parts. If you have a website selling products and an inventory management system and an automatic price-setting tool, it's hard to use a different DBMS for each one.

Even for data sets that seem unrelated at first, there may be a lot of value in the small connections between them. This is becoming increasingly apparent and companies are trying very hard to see these connections. Being in separate systems just makes that more difficult.

So, there are good reasons to use multiple database systems, but there is also a much higher cost. Saying "use the right tool for the job" doesn't give any guidance about when it's worth the cost and when it's not.

I think you're mixing concerns a bit. For data warehousing purposes, I agree that it's absolutely preferable to have all the data in one place (like hadoop/HDFS).

For production OLTP stuff, I'd argue that it's a bad idea to do the kind of processing you're talking about in the database unless you can avoid it. Beyond the performance implications, you'll likely have to alter your schema in unnatural ways that you wouldn't otherwise.

Now, I absolutely agree that you need to do a cost/benefit analysis and that there are costs associated with having multiple databases. But I don't think those costs are as high as they would appear on first intuition.

I think you can run into problems in OLTP, as well. To stick with the example, you have three systems: sales from the website, price-setting tool, and inventory system.

Should the sale happen at all? Not if the inventory is depleted. Sure, you can put it on back-order, but then you have an unhappy customer.

At what price should the sale happen? It would be nice if you could automatically raise prices when the inventory drops below 10 units (which may indicate a demand spike or a supply interruption), for example. If you don't raise prices soon enough, you're more likely to run into a depleted inventory, again making the customer unhappy.

And what if you encounter an error moving data between systems? The customer thinks the sale happened, but it wasn't (or couldn't be) loaded into the inventory system for some reason. The customer will call a week later asking why it still isn't shipped, the service rep will be clueless trying to trace between the systems, and ultimately the customer will be unhappy.

(Just to be clear: properly integrated data management may still be done with multiple systems. But it's harder.)