Hacker News new | ask | show | jobs
by j_baker 5341 days ago
I think you're mixing concerns a bit. For data warehousing purposes, I agree that it's absolutely preferable to have all the data in one place (like hadoop/HDFS).

For production OLTP stuff, I'd argue that it's a bad idea to do the kind of processing you're talking about in the database unless you can avoid it. Beyond the performance implications, you'll likely have to alter your schema in unnatural ways that you wouldn't otherwise.

Now, I absolutely agree that you need to do a cost/benefit analysis and that there are costs associated with having multiple databases. But I don't think those costs are as high as they would appear on first intuition.

1 comments

I think you can run into problems in OLTP, as well. To stick with the example, you have three systems: sales from the website, price-setting tool, and inventory system.

Should the sale happen at all? Not if the inventory is depleted. Sure, you can put it on back-order, but then you have an unhappy customer.

At what price should the sale happen? It would be nice if you could automatically raise prices when the inventory drops below 10 units (which may indicate a demand spike or a supply interruption), for example. If you don't raise prices soon enough, you're more likely to run into a depleted inventory, again making the customer unhappy.

And what if you encounter an error moving data between systems? The customer thinks the sale happened, but it wasn't (or couldn't be) loaded into the inventory system for some reason. The customer will call a week later asking why it still isn't shipped, the service rep will be clueless trying to trace between the systems, and ultimately the customer will be unhappy.

(Just to be clear: properly integrated data management may still be done with multiple systems. But it's harder.)