| (I also commented on the article, but nobody seems to be commenting over there, so reposting...) The advantage of statically typed RDBMS that he's leaving out here are (a) storage optimization (b) query optimization (as jawngee has already noted). You could get halfway to simulating a dynamically-typed RDBMS by declaring all your columns as, say, VARCHAR(5000). You could store strings, integers, floats, dates, etc. all in there pretty simply. However, they would use a lot more storage space as strings than they would as native data types (e.g. integer 1000 is one byte, string '1000' is 4). Over a large data set that would really add up. Secondly, when doing queries, your comparison operators (e.g. WHERE date > 2009-07-06) would be way less efficient as string comparisons than native type comparisons. I don't want to be dismissive and curmudgeonly about this, but over and over what I hear from people enthusiastic about NoSQL solutions is that they solve the current problems of RDBMS, while forgetting all the great features that we have spent the last 30 years building into database systems. The current "limitations" of SQL-based systems are often in fact age-old trade-offs that we made, but people have forgotten the positive benefits of those trade-offs. The reason databases are statically typed is because that saves storage space and processing time. The reason there's a standardized, domain-specific language that you have to learn is because having to learn a completely new API and mental model every time you want to access a data store from a new vendor is inefficient. Oh, and the reason SQL is so complicated is because relational algebra is complicated. Sure, SQL and RDBMS have their limitations. They're not the right tool for every job, and they are not even 100% perfect at the jobs where they are the right tool. But too often I hear people saying "fuck SQL!" simply because they don't want to learn it, and because they're too early on in their little pet project to realize the scalability problems a NoSQL system is going to run into that RDBMS solved 20+ years ago. |
The problem with this argument is that when I can slap several TB into every node in my db cluster the storage cost issue is basically moot. When it is cheaper to buy ten small boxes than one beefy server the processing time and query optimization issue also starts to come into question. At some point soon, and we may have already passed that point, the cost of centralizing db queries through a small number of expensive, highly tuned servers necessary to maintain the pillars of the RDBMS model will be outpaced by the benefits of abandoning both optimizations in favor of being able to use a small, cheap fleet of boxes running a distributed database.
The tradeoffs that were made several decades ago when the RDBMS and SQL ascended to their current dominance may have made sense at the time, but a lot has changed since then. You may try to delude yourself into thinking that there are "scalability problems a NoSQL system is going to run into that RDBMS solved 20+ years ago" but given the fact that almost all of these systems were designed with an eye on under what conditions RDBMS failed and in light of decades of research into distributed systems and scalability that did not exist when SQL/RDBMS emerged I find it hard to believe that there are scalability problems that RDBMS solved that NoSQL systems will run into. Hell, most of them were designed to solve specific scalability failures in big data systems where RDBMS fell over and died.