Hacker News new | ask | show | jobs
by lostjohnny 2136 days ago
> I would point to MongoDB's query language

seriuosly?

    db.orders.aggregate([
       {
          $lookup:
             {
               from: "warehouses",
               let: { order_item: "$item", order_qty: "$ordered" },
               pipeline: [
                  { $match:
                     { $expr:
                        { $and:
                           [
                             { $eq: [ "$stock_item",  "$$order_item" ] },
                             { $gte: [ "$instock", "$$order_qty" ] }
                           ]
                        }
                     }
                  },
                  { $project: { stock_item: 0, _id: 0 } }
               ],
               as: "stockdata"
             }
        }
    ])
VS

    SELECT *, stockdata
    FROM orders
    WHERE stockdata IN (SELECT warehouse, instock
                        FROM warehouses
                        WHERE stock_item= orders.item
                        AND instock >= orders.ordered );
4 comments

Another huge win for SQL is that it's easy to construct from parts. You can very easily run and debug your subquery or common table expression on its own before combining it into a larger, more-complex query. If you (as I usually do) create plenty of views while analysing a dataset, the approach can be extremely powerful.

Doing the same in JavaScript is possible, but it's slow and cumbersome by comparison.

The original article addresses the deficiency in construction from parts, in its "Lack of Orthogonality" section.

MongoDB queries, while being interpretable by javascript, aren't really javascript. You can't interact with the data using javascript (well, you can, using eval, but you shouldn't). You interact with the data via the query language, which is, again, expressed in JS, just like SQL is expressed in English.

It's more accurate to consider the Aggregation Pipeline as being the "composable" system to get at data in MongoDB. And its exceedingly composable; far more than SQL. It's literally a pipeline; a series of steps which fetch, mutate, filter, map, limit, calculate, correlate, relate, and otherwise interact with the data in a database. Each step operates on the output of the previous step, in series. You can programmatically swap steps in-and-out, in production, with no string manipulation or ORM, debug each step in series, remove steps, see the output, get performance characteristics on each step. There's no complex black-boxed query execution planner or compiler, because the query plan is the pipeline.

> with no string manipulation or ORM

OR, another way to look at it, using MongoDB's ORM, which is quite bad if you ask me.

If you try to directly translate SQL to MongoDB, without changing your schemas or query, then yes, it's gonna look bad. That doesn't make for a good comparison, and I think you know that.
> That doesn't make for a good comparison, and I think you know that.

That query is not my invention, it comes directly from MongoDB documentation

If it looks bad, it means it is bad by design

https://docs.mongodb.com/manual/reference/operator/aggregati...

MongoDB is better because it's Web Scale!

http://www.mongodb-is-web-scale.com/

you're comparing $lookup, an operator that nosql isn't designed for to a join, an operator sql was designed for
Lookups are very common in MongoDB; Starting with SQL, lifting the data as-is into Mongo, and translating the queries 1:1 will just result in garbage queries, like that one.

An "idealized" NoSQL schema is far, far more complex than anything anyone used to SQL would arrive at ([1]), but most of that is because in a "pure" NoSQL/Document-oriented database, the query engine simply isn't that powerful (think Dynamo). MongoDB has an inordinately powerful array of tools to get at data in a performant way, and $lookup is available as one of those tools. Can it be misused? Yeah; just look at the parent comment to see clear misuse. But generally, it's very common to see.

Modern thinking around MongoDB schema design is closer to SQL than NoSQL/Dynamo. Arrays are bad, denormalization can be valuable but use sparingly, that kind of stuff.

[1] https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

I wouldn't say arrays are bad, and the entire paradigm of data modeling in mongodb is to store your data based on your application usage patterns. if you have to query across multiple collections via a $lookup, then maybe you'd benefit from embedding the smaller of those collections into the former.
Maybe. But, as a general rule, I advise against arrays of unbounded size on documents (arrays of a known bounded size, say, containing enums to act as a multi-value flag, or email addresses on a user's account, or something like that, are fine).

One example: We used mongodb to track the state of a general CSV import system. So, we'd have a document for each csv file a user imported, and on that document, we were storing errors which occurred during the import to later display to the user. Of course, in an array. Worked great for years, until one day, a user uploaded a very bad CSV, non-maliciously, with hundreds of thousands of lines, with dozens of errors on each line, generating an array millions of items large. The failure condition here was wild: the import just got slower, and slower, and slower, until eventually the (modestly provisioned) db cluster started failing. We immediately normalized that array into its own collection, re-ran the import, still generated millions of errors, but with no problem.

As always, a general rule doesn't apply in every situation, but in my experience, unless you have a really strong grasp on how a system's use will scale, years into the future, unbounded arrays are icky. Lookups across two collections are only a modest performance loss over a direct query, and if the arrays get up there in size, they can actually be faster.

Interesting, we ran into a very similar thing at my current place where we have a CSV importer storing results in mongo. Importantly it also stores the errors that occurred during the import.

In our case someone uploaded a _huge_ CSV with some misalignment in the columns so every row had an error.

The resulting the mongo document was larger the max document size (16MB?), so it couldn't even save to the database.

Mongo is painful to work with and I feel like I keep finding more reasons to hate it.