Hacker News new | ask | show | jobs
by evgen 6083 days ago
> The reason databases are statically typed is because that saves storage space and processing time

The problem with this argument is that when I can slap several TB into every node in my db cluster the storage cost issue is basically moot. When it is cheaper to buy ten small boxes than one beefy server the processing time and query optimization issue also starts to come into question. At some point soon, and we may have already passed that point, the cost of centralizing db queries through a small number of expensive, highly tuned servers necessary to maintain the pillars of the RDBMS model will be outpaced by the benefits of abandoning both optimizations in favor of being able to use a small, cheap fleet of boxes running a distributed database.

The tradeoffs that were made several decades ago when the RDBMS and SQL ascended to their current dominance may have made sense at the time, but a lot has changed since then. You may try to delude yourself into thinking that there are "scalability problems a NoSQL system is going to run into that RDBMS solved 20+ years ago" but given the fact that almost all of these systems were designed with an eye on under what conditions RDBMS failed and in light of decades of research into distributed systems and scalability that did not exist when SQL/RDBMS emerged I find it hard to believe that there are scalability problems that RDBMS solved that NoSQL systems will run into. Hell, most of them were designed to solve specific scalability failures in big data systems where RDBMS fell over and died.

2 comments

You're putting a lot of faith into this. The fact is that performance, scalability, features, and correctness are a trade-off. RDBMS is optimized for certain conditions and these NoSQL databases are optimized for other conditions. NoSQL databases are horrible at things that RDBMS's do extremely well and vice-versa. NoSQL isn't some magic discovery -- of course they're going to run into problems that RDBMS solved years ago!

Your argument for small boxes seems to be the same argument for one large box. My (extremely average) desktop has a terabyte of storage in it -- why do I need a bunch of boxes running a distributed database? One big server is a hell of a lot easier to manage than a fleet (a fleet!) of small boxes.

You make a very valid point, and it's one I do think about. I try not to bash NoSQL, because it does make sense sometimes, but there's a recent tide of "SQL is dead, long live NoSQL" that ignores the fact that RDBMS are a superior solution in some cases. Really, I think most people agree that RDBMS are sometimes better, and the argument revolves around how often "sometimes" is.

I don't doubt that NoSQL systems solve scalability problems people experience when they use (or rather, mis-use) RDBMS. My point is that they are making trade-offs in terms of storage, data integrity, etc. that they may not be considering, because those things used to be "free" with RDBMS so they weren't a concern.

It's certainly true that RDBMS were born in a very different age in terms of expense of both storage and processing. As with MapReduce, with is an astonishingly inefficient but extremely fast and scalable way of crunching big volumes of data, NoSQL systems may be making use of huge leaps in the availability of resources to prioritize speed and scalability over other concerns.

However, as I mentioned, there seem to be lot of newbie developers who aren't thinking about NoSQL vs. RDBMS in these terms ("speed" vs. "data integrity", say) but instead in terms of "easy" vs. "hard", or even just "new hotness" vs. "old and busted". As a result they may be avoiding one set of problems solved by NoSQL only to run into a totally different set solved by RDBMS, without considering which is more important for their app.