That's one of the tough myths to overcome when building on top of a relational database. Changing the Database has traditionally been Hard, so everybody has been trained to think that way, therefore nobody is allowed to change the database because Changing the Database is hard. Try it at your bigco and the old guys will do everything in their power to stop you (thus making changing the database hard to do).
Ignore that rule and build yourself an environment where changing the database is easy. I make schema changes to my stuff all the time, and seldom push a release live that doesn't do so. The tools are in place to ensure that it's No Big Deal, so it just works.
If you live in a world where changing your SQL database is easy, it sort of takes the wind out of the "start with NoSQL, because changing the database is easy" argument. You get all the speed advantages of being schema-flexible, and you can write ad-hoc queries when you want to, so you're flexible in that direction too.
Let me clarify - I'm a startup hacker, not a bigco guy at all :)
And I still use MySQL all the time, right alongside so-called NoSQL solutions where they are better fit to a given purpose: Membase for high-availability collections on the order of billions-of-records in a social game; MySQL for defining the game world itself; mongoDB for any and all data for which eventual-consistency doesn't matter (e.g. analytics). I've streamlined my MySQL dealings in precisely the ways you outlined. I have change scripts for every schema change, and I have YML-driven schema auto-generation in Symfony and declarative. Schema changes are pushed out through staging to production - lazily when possible, and actively when possible.
But despite all the process improvements in the world, the dev time savings that go along with a smart, document-oriented model layer are not a myth. I assure you, they are very real. No longer does every new feature have to start with schema design (no matter how streamlined your schema alteration process may be, it has a nonzero cost, and one which definitely increases with scale). You can instead just get right to the code, and start setting and getting the properties your new feature will need.
The fact remains: SQL is not a one-size-fits-all solution any more than the recent "massively scalable" data stores are. A modern backend engineer should know a lot about a variety of datastore solutions, and should think long and hard about which data should be stored in which manner(s).
- Change scripts for every schema change, stored in source control.
- An automated build/deploy that pulls down new change scripts and executes them in order.
- (optional) a good way to generate your backend CRUD by looking at the existing database schema, or as a lesser option an ORM that does the same thing.
So your workflow is: script out your schema change, check it in, apply it to your dev environment, regenerate the CRUD from your local schema, fix any compile errors that you've introduced. (and optionally make sure all your unit tests still pass).
You'll notice that all that stuff above is just the basic workflow you should have in place anyway.
The problem is not whether you can, but how long does it take. At least with mysql, alter table for a large table will take a long time (hours, if not more), during which the table cannot be written to.
That's not representative of RDBMSs in general. In Oracle we don't think twice about adding a column to a table during production hours. The only issue is if the new column has a default value and you have millions of rows that you need to "backfill" but that's just a big transaction; it's nothing remarkable in and of itself, if you could do a transaction that big anyway, you'd just go ahead and do it. And of course, in Oracle readers don't block writers and writers don't block readers, we have MVCC.
Once again, NoSQL is shown to be a reaction against MySQL, not RDBMSs in general.
That was but one example out of many. Oracle is a great database. It isn't great at everything, however. Geographically distributed, fault tolerant, scale out architectures as often required for big online services? Not a great fit. Multi-petabyte complex analytics processing? Not a great fit.
There is a set of relational database folks fixated on the false dichotomy of relational databases OR non-relational databases. In practice, they are often combined in a variety of ways. Insisting on One True Database is like insisting on One True Operating System or One True Programming Language. Stop obsessing over tools and build useful stuff!
Oh indeed, right tool for the job, I'm 100% with you there.
The issue is, from the NoSQL camp we hear about "schema rigidity". We hear "SQL doesn't scale". These things simply aren't true! It's as if someone had only ever written .BAT files on DOS and thought that all its limitations applied to Python as well (and went and told experienced professional Python devs that!).
There have been "object repositories" such as Versant for a long time. The NoSQL types seem oblivious to these too.
Schema rigidity is a canard originating with the same folks who think a hash is a type system. I encourage you to dismiss the folks who say things like that, rather than dismissing some very useful technologies.
yes, but oracle is expensive, especially since it requires people who really know about oracle if you want to be up to speed relatively quickly. So as always, it is a tradeoff: it seems that in some cases, not having the usual RDBMS guarantees is ok because there are less admin costs, etc...
So sure, some people don't understand those tradeoff and make stupid choices. But people who choose technology without properly assessing the risks/advantages are bound to fail anyway.
If your major expense is people who know what they're doing, then Oracle is not even the most expensive platform... During any web boom I bet LAMP guys were billing higher hourly rates than people doing Oracle!
Google and Facebook have all the money and talent required to deploy epic Oracle systems. Instead they use GFS and BigTable and Cassandra and HBase and Scribe and Hadoop and MySQL and a host of other systems. Amazon has massive Oracle deployments, so plenty of money and knowledge on the topic, but still built Dynamo and S3.
There are billions of dollars riding on this for them. Instead of insisting they are wrong, ask yourself why they might be right.
Google and Facebook aren't good examples, because they have no hard transactional requirements for their main applications. If a web page isn't included in one search result but is in the same search executed on a different node 5 minutes later, who would notice or care? If a status update gets dropped, it might be annoying, but you can always just resend it.
If you want to compare like for like, ask why Visa isn't using MongoDB for authorizations, or why American Airlines isn't using Redis for reservations.
So you are aware, yours is the traditional response. Somehow, doing what Google and Facebook do is "easy" because not everything requires transactions. This is false both because scale like that makes almost everything difficult, and because, as Google recently published, they are using transactions for their main application. NoSQL does not imply lack of transactions and transactions do not imply relational databases.
I doubt that any DB that stores data physically in a row oriented format doesn't have issues with this particular type of change. But of course there are many other ways to work around that. For instance, you could just create a new table and a view to join the two. Or, if a particular table changes all the time, redesign it to store key value pairs instead. RDBMS can easily be used in a schemaless fashion if needed for particular scenarios.
Whether it saves time or not depends on a lot of things. If users actually determine the structure of your data then you're right. If it's just about schema evolution by developers I don't think that moving schema constraints into procedural code makes things simpler or more flexible.
RDBMS does require a bit of planning (not too much, and not too rigid) and previous experiences building good data model; both traits don't exist in hotshot hackers/startups these days because they either don't have the patience or much deeper experience.
I seriously doubt that this is true. SQL databases are extremely easy to adjust. Hell you can add columns on your live database in the middle of the day if you want. Removing data is harder but normally you don't have to delete columns just to launch your new code.
Another thing to remember is that most SQL systems are mature and there is pretty much always very good tools available to do any kind of changes you may need to do.
The existence of great tools for relational databases is a compelling argument for using them. As I said in the article, starting off using a single, monolithic relational store is a successful approach employed by many, successful companies. I would suggest, though, that the rest of your comment indicates a lack of experience with relational databases at large-scale. One metric provided by Twitter in a presentation was that an ALTER TABLE command took 2 weeks to run on a previously centralized relational database. Perhaps someone from Twitter can add some color to that anecdote.
Like I say, in business there is no cheap or expensive. There's worth the money, or not.
Yes, Oracle costs money. But so does "rolling your own". How much of their VCs cash has say Twitter spent doing that? Yes, Oracle "locks you in". But your own legacy code locks you in whatever platform you've built it on.
In the specific case of FlockDB, that's actually not the case as the shards are modular. SQLShard is one implementation. There was an experimental Redis shard implementation, as well. You might be both overestimating the cost of building things like this and underestimating the enormous drag of using closed source components in these systems. If you've personally built online services this big with Oracle, well done. What did it cost?
What scale? How many concurrent users of your specific site break MySQL, and how many concurrent users do you have? Not answering or considering those questions is what leads to premature scaling.
Ignore that rule and build yourself an environment where changing the database is easy. I make schema changes to my stuff all the time, and seldom push a release live that doesn't do so. The tools are in place to ensure that it's No Big Deal, so it just works.
If you live in a world where changing your SQL database is easy, it sort of takes the wind out of the "start with NoSQL, because changing the database is easy" argument. You get all the speed advantages of being schema-flexible, and you can write ad-hoc queries when you want to, so you're flexible in that direction too.