| NoSQL discussions always seem to conflate three very different things: storage engines, APIs and architecture. Where do we store the data ? How do we access it ? How do we make sure it scales ? The "traditional" approach is to use Oracle/SQL Server/MySQL for storage, SQL and/or ORM as an API, and single-server tables-with-relationships as an architecture. Back in the early 2000s, everybody did this. Sure, there were a few performance-minded exceptions that went with sharding or master-slave architectures instead, but those were exceptions. And single-server architectures tend to behave badly at medium loads. Spend the market rate for a genius DBA, and they still behave badly at high loads. The next step is a 32-core 128GB RAM monstrosity that costs an order of magnitude more than what eight 4-core 16GB servers would cost. Most NoSQL solutions came with a new architecture. You had the MongoDB flavor of distributed storage, or the BigTable flavor of distributed storage, or the CouchDB flavor of distributed storage, and so on. Properly implemented distributed storage eats high loads for breakfast: just add more servers. This is a good thing. My issue with the NoSQL movement is that they threw away the baby with the bath water. They threw away the single-server relational architecture, which was a nice change, and they also gave up the old battle-hardened storage engines and the highly expressive SQL language and replaced them with only-recently-experimental engines and ad hoc lean APIs. It takes time for a storage engine to mature. To have all its performance kinks ironed out and all its bugs smoked out. I still remember the brouhaha around MongoDB persistence guarantees, or the critical data loss bugs in CouchDB. And the lean APIs just forced back all the querying logic into the application, with all the filtering and the manual indexing and the joins and the approximate but ultimately incorrect implementations of whatever subset of ACID was required at the time. This wasn't an entirely bad thing: it certainly made many developers aware of the performance implications of some joins or transactions. But when you need to write a JOIN or GROUP BY or BEGIN TRANSACTION that you know will scale properly, and there's no API support for it ? Feh. I'm a huge fan of the CouchDB architecture. Distributed change streams, with checkpointed views and cached reductions. But I have been burned by the CouchDB storage engine (can you say "data corÊ–NÑ %ñXtion" ?) and I see no point in bending knee to the laconic CouchDB API. So I took the CouchDB architecture and reimplemented it with a PostgreSQL back-end. It's _faster_ (don't underestimate the cost of those HTTP requests), I have trust that after PostgreSQL's decade-long history all threats to my data are long gone, and I can always whip out an SQL query when I do need it. It's nice to see so many NoSQL solutions migrating back to an SQL-like API and gaining enough maturity to keep your data safe. In the near future, I expect them to be nothing more than "Architecture in a box" solutions for when you don't want to implement specific architectures in SQL. And I expect more and more "architecture plugins" to become available: with a library, turn a herd of SQL databases into a distributed architecture of type X. |