Hacker News new | ask | show | jobs
by rgo 3217 days ago
Here's my personal version of the story:

With SQL DBs (Oracle, SQLServer and MySQL):

1. SQL database migrations where killing us. Going back and forward in a dev environment was impossible. No hot deploy in production.

2. Could not work well with application user-defined fields: adding columns adhoc to the database, indexing them, normalizing and denormalizing, performance issues, everything was a problem.

3. Blobs holding logging data got unmanageable quickly.

4. Joins where very hard to optimize even though the team had a lot of DBA experience fine tuning databases.

5. Had to build a very complex architecture around the database for a product that was not that complex: cache, search, database, blob store, distributed, etc.

And with all our 1990s and 2000s previous experiences in data warehousing, business intelligence and DB optimization tools, we were still wasting valuable time with SQL design, indexing, query planning and parameter optimization. So we gave MongoDB a try. First as a cache. Later as the only DB.

Our journey:

1. Heard about Mongo. Tried the DB. The driver worked great. To me that's the number one "marketing antics behind MongoDB": their strategy creating drivers and supporting the programmer community.

2. Understood what NoSQL meant and forgot about joins altogether.

3. Understood what NoSQL meant and built transactions into atomic documents.

4. Understood what NoSQL meant and stopped relying on the database for type, primary and foreign key constraints, default values, triggers (argh!), stored procedures (2x argh!), etc.

5. Simplified the architecture with integrated search, queue and cache. Less moving parts = joy.

6. Result: very low maintenance, easy install, configuration, replication and migrations. 99.999% availability.

7. Bonus: we even implemented a very high frequency, atomic distributed semaphore system with a FIFO queue that reaps zombies using Mongo built-in networking features.

So we've reduced DB-related issues by an order of magnitude. How? I think because NoSQL is a way of saying the DB should not be magically answering random queries. A database should be a data store, period -- just store and retrieve data the way the app needs it. By focusing our energies on getting the data right as documents for a document store meant data flows as objects from code in and out of Mongo.

I believe people underestimate how important (and productive) it is to keep the same data structures flowing between the UI (JSON), server (Object/Hash/Dictionary) and DB (document). It makes code easier to read and more resillient to errors.

But SQL DBs come with a convenience layer bolted on to run random user queries with things like OUTER joins and GROUP BYs. For that we need to flatten data into tables, which clashes with typically how data flows in an app.

SQL DBs however are great as the single source of truth for data: a schema can be laid out and enforced independently of code, so it's safely guarded from programmers breaking it. Business sets up a SQL DB so that their reporting people can query data on demand while consultants with zero knowledge of the business can write code limited by constraints managed by DBAs. SQL is even taught at business schools, which is revealing of who its target audience actually is.

Bottom-line: SQL and schema enforcing are end-user features we did not need to build our tool. On the other hand, every single MongoDB feature is something we need and use profusely.

3 comments

> 2. Understood what NoSQL meant and forgot about joins altogether.

How would you represent a simple invoicing system in MongoDB (e.g. Customers + Products + Orders + OrderLineItems )? NoSQL-for-everything advocates posit two solutions: either denormalize the data by embedding Customer information within an Order document, which also contains an array of OrderLineItems, or use a UUID as a kind-of foreign key and maintain separate relationships. Both approaches have serious problems (data-duplication and inevitable inconsistency in the first, and lack of referential integrity in the second, besides ending-up abusing a NoSQL database as an RDBMS). Is there a better way? Or would you agree that certain classes of problems are best left to RDBMS' domain?

The example you've used (invoices) is actually quite instructive for demonstrating the benefits of a "document store." An invoice, historically, was a literal printed piece of paper. Invoices are actually really annoying to implement in an RDBMS because of so-called "referential integrity" -- an invoice should be a "snapshot in time" of everything that happened when the order was processed, so ideally, when a user views their invoices from the past 2 years, they look the same every time.

Except, oops, your user got married and moved, now your precious "referential integrity" means jack because the generated invoice is flat-out wrong. Product removed from the store? Too bad, needs to stay in the database forever for historical purposes. Prices need to change? Better design the database to handle snapshots of every product state.

If you were implementing this in MongoDB, you'd probably store a UUID and the flattened data at the time of invoice generation, that way you can still query on ids AND not deal with the headache of having a combinatorial explosion of data in your RDBMS.

You would solve this in a RDBMS the same way: de-normalize when you're saving the invoice (example: a line items table with snap shot of current item price, description, etc.)
Yes, which suggests that the "serious problems" mentioned by the grandparent aren't serious (or problems) at all.
In Postgres, you'd simply have a table with a JSON column for the snapshot-in-type contract.

You can then select fields from that JSON for invoices, reports, etc with the arrow operator:

https://www.postgresql.org/docs/9.6/static/functions-json.ht...

With SQL you can denormalize all that (and should) to create that snapshot. But with NoSQL you can't normalize and get back a way to quickly query the number of products sold per month over the last 5 years.
Yes, this is possible with Aggregation and MapReduce: https://docs.mongodb.com/manual/aggregation/
For relative values of "quickly".
Instead of nebulous terms like NoSQL you should instead just look at the damn features because these concepts are orthogonal. MongoDB has transaction isolation on the document level instead of the database level. If you can store everything in a single document then it doesn't matter. If you can't then use a database that supports database level transactions. It doesn't matter if it's a NoSQL or RDBMS database.

I feel a lot of people know that typical nosql databases (without database level transactions) are not suitable for their problem but they don't know why and then just think NoSQL is always bad and RDBMS are always better because the NoSQL databases are intended to be used for different problems.

Not the original commentor, but there are some valid cases for NoSQL: some people use it for storing massive amounts of web crawling data. But the thing here is that it's throw-away'ish, and in that case it's often not worth it to add structure (even though there pretty much is structure in everythig you look at long enough).

But I do think having any data consisting of, say, items, orders, users, payment in MongoDB is very much a bad idea. Been there.

> I think because NoSQL is a way of saying the DB should not be magically answering random queries.

The reason this is wrong is something that Codd et al learned a while ago: the data is MORE IMPORTANT than the application. Applications change and/or become obsolete; the data doesn't. You will still need to query the same database 50 years from now, but you likely won't have the same application to do it with. That means that everything that is important to the data (schema, constraints and so on) needs to stay with the data.

What was your tool?