| Here's my personal version of the story: With SQL DBs (Oracle, SQLServer and MySQL): 1. SQL database migrations where killing us. Going back and forward in a dev environment was impossible. No hot deploy in production. 2. Could not work well with application user-defined fields: adding columns adhoc to the database, indexing them, normalizing and denormalizing, performance issues, everything was a problem. 3. Blobs holding logging data got unmanageable quickly. 4. Joins where very hard to optimize even though the team had a lot of DBA experience fine tuning databases. 5. Had to build a very complex architecture around the database for a product that was not that complex: cache, search, database, blob store, distributed, etc. And with all our 1990s and 2000s previous experiences in data warehousing, business intelligence and DB optimization tools, we were still wasting valuable time with SQL design, indexing, query planning and parameter optimization. So we gave MongoDB a try. First as a cache. Later as the only DB. Our journey: 1. Heard about Mongo. Tried the DB. The driver worked great. To me that's the number one "marketing antics behind MongoDB": their strategy creating drivers and supporting the programmer community. 2. Understood what NoSQL meant and forgot about joins altogether. 3. Understood what NoSQL meant and built transactions into atomic documents. 4. Understood what NoSQL meant and stopped relying on the database for type, primary and foreign key constraints, default values, triggers (argh!), stored procedures (2x argh!), etc. 5. Simplified the architecture with integrated search, queue and cache. Less moving parts = joy. 6. Result: very low maintenance, easy install, configuration, replication and migrations. 99.999% availability. 7. Bonus: we even implemented a very high frequency, atomic distributed semaphore system with a FIFO queue that reaps zombies using Mongo built-in networking features. So we've reduced DB-related issues by an order of magnitude. How? I think because NoSQL is a way of saying the DB should not be magically answering random queries. A database should be a data store, period -- just store and retrieve data the way the app needs it. By focusing our energies on getting the data right as documents for a document store meant data flows as objects from code in and out of Mongo. I believe people underestimate how important (and productive) it is to keep the same data structures flowing between the UI (JSON), server (Object/Hash/Dictionary) and DB (document). It makes code easier to read and more resillient to errors. But SQL DBs come with a convenience layer bolted on to run random user queries with things like OUTER joins and GROUP BYs. For that we need to flatten data into tables, which clashes with typically how data flows in an app. SQL DBs however are great as the single source of truth for data: a schema can be laid out and enforced independently of code, so it's safely guarded from programmers breaking it. Business sets up a SQL DB so that their reporting people can query data on demand while consultants with zero knowledge of the business can write code limited by constraints managed by DBAs. SQL is even taught at business schools, which is revealing of who its target audience actually is. Bottom-line: SQL and schema enforcing are end-user features we did not need to build our tool. On the other hand, every single MongoDB feature is something we need and use profusely. |
How would you represent a simple invoicing system in MongoDB (e.g. Customers + Products + Orders + OrderLineItems )? NoSQL-for-everything advocates posit two solutions: either denormalize the data by embedding Customer information within an Order document, which also contains an array of OrderLineItems, or use a UUID as a kind-of foreign key and maintain separate relationships. Both approaches have serious problems (data-duplication and inevitable inconsistency in the first, and lack of referential integrity in the second, besides ending-up abusing a NoSQL database as an RDBMS). Is there a better way? Or would you agree that certain classes of problems are best left to RDBMS' domain?