Hacker News new | ask | show | jobs
by jeffdavis 5341 days ago
"I think the most important lesson we can learn from NoSQL in general is that the idea of a one-size-fits-all database is becoming dated."

For programming languages, using the "right tool for the job" has little downside. Perhaps the developers need to learn an extra language, or perhaps there is some communication overhead between them. But unless the components are tightly-coupled, there's not much of a loss.

In contrast, the value of the whole data is greater than the sum of the parts. If you have a website selling products and an inventory management system and an automatic price-setting tool, it's hard to use a different DBMS for each one.

Even for data sets that seem unrelated at first, there may be a lot of value in the small connections between them. This is becoming increasingly apparent and companies are trying very hard to see these connections. Being in separate systems just makes that more difficult.

So, there are good reasons to use multiple database systems, but there is also a much higher cost. Saying "use the right tool for the job" doesn't give any guidance about when it's worth the cost and when it's not.

1 comments

I think you're mixing concerns a bit. For data warehousing purposes, I agree that it's absolutely preferable to have all the data in one place (like hadoop/HDFS).

For production OLTP stuff, I'd argue that it's a bad idea to do the kind of processing you're talking about in the database unless you can avoid it. Beyond the performance implications, you'll likely have to alter your schema in unnatural ways that you wouldn't otherwise.

Now, I absolutely agree that you need to do a cost/benefit analysis and that there are costs associated with having multiple databases. But I don't think those costs are as high as they would appear on first intuition.

I think you can run into problems in OLTP, as well. To stick with the example, you have three systems: sales from the website, price-setting tool, and inventory system.

Should the sale happen at all? Not if the inventory is depleted. Sure, you can put it on back-order, but then you have an unhappy customer.

At what price should the sale happen? It would be nice if you could automatically raise prices when the inventory drops below 10 units (which may indicate a demand spike or a supply interruption), for example. If you don't raise prices soon enough, you're more likely to run into a depleted inventory, again making the customer unhappy.

And what if you encounter an error moving data between systems? The customer thinks the sale happened, but it wasn't (or couldn't be) loaded into the inventory system for some reason. The customer will call a week later asking why it still isn't shipped, the service rep will be clueless trying to trace between the systems, and ultimately the customer will be unhappy.

(Just to be clear: properly integrated data management may still be done with multiple systems. But it's harder.)