Hacker News new | ask | show | jobs
by quietbritishjim 2744 days ago
> IMO, the reason is that newer developers faced the choice of learning SQL or learning to use something with a Javascript API.

The thing I dislike about this type of comment – although I now notice yours doesn't explicitly say this – is the implication that devs don't like SQL because they're lazy or stupid. Well, sometimes that is probably true! But there are some tasks where you need to build the query dynamically at run time, and for those tasks MongoDB's usual query API, or especially its aggregation pipeline API, are genuinely better than stitching together fragments of SQL in the form of text strings. Injection attacks and inserting commas (but not trailing commas) come to mind as obvious difficulties. For anyone not familiar, just look at how close to being a native Python API pymongo is:

    pipeline = [
        {"$unwind": "$tags"},
        {"$group": {"_id": "$tags", "count": {"$sum": 1}}},
        {"$sort": SON([("count", -1), ("_id", -1)])}
    ]
    result_cursor = db.things.aggregate(pipeline)
Of course you could write an SQL query that does this particular job and is probably clearer. But if you need to compose a bunch of operations arbitrarily at runtime then using dicts and lists like this is clearly better.

Of course pipelines like this will typically be slow as hell because arbitrary queries, by their nature, cannot take advantage of indices. But sometimes that's OK. We do this in one of our products and it works great.

With JSONB and replication enhancements, Postgres is close to wiping out all of MongoDB's advantages. I would love to see a more native-like API like Mongo's aggregation pipeline, even if it's just a wrapper for composing SQL strings. I think that would finish off the job.

6 comments

Elixir's primary database wrapper, Ecto [0], lets you dynamically build queries at runtime, and also isn't an ORM. Here's two examples directly from the docs:

  # Query all rows in the "users" table, filtering for users whose age is > 18, and selecting their name
  "users"
  |> where([u], u.age > 18)
  |> select([u], u.name)

  # Build a dynamic query fragment based on some parameters
  dynamic = false
  
  dynamic =
    if params["is_public"] do
      dynamic([p], p.is_public or ^dynamic)
    else
      dynamic
    end
  
  dynamic =
    if params["allow_reviewers"] do
      dynamic([p, a], a.reviewer == true or ^dynamic)
    else
      dynamic
    end
  
  from "posts", where: ^dynamic
Across all the different means of interacting with a database I have experience with (from full-fledged ORMs like ActiveRecord, to sprocs in ASP.NET), I've found that it offers the best compromise between providing an ergonomic abstraction over the database, and not hiding all of the nitty-gritty details you need to worry about in order to write performant queries or use database-specific features like triggers or window functions.

My main point, though, is that you don't need to reach for NoSQL if all you need is a way to compose queries without string interpolation.

[0] https://github.com/elixir-ecto/ecto

As I said to a sibling response, this is not a substitute for Mongo's aggregation pipeline unless it can do analogous things to Postgres's JSONB fields. For example, can it unwind an array field, match those subrecords where one field (like a "key") matches a value and another field (like a "value") exceeds an overall value, and then apply this condition to filter the overall rows in the table?

Also, one of the benefits of Mongo's API is that it has excellent native implementations in numerous languages (we already use C++ and Python), so a suggestion to switch language entirely is not really equivalent.

> As I said to a sibling response, this is not a substitute for Mongo's aggregation pipeline

Huh? The aggregation framework is a solution to a mongo-only problem. Most other databases are performant, but Mongo suffers wildly from coarse locking and slow performance putting things into and retrieving things from the javascript VM.

> For example, can it unwind an array field, match those subrecords where one field (like a "key") matches a value and another field (like a "value") exceeds an overall value, and then apply this condition to filter the overall rows in the table?

This sounds suspiciously like a SQL view.

Edit: But if you actually need an array in a cell, Postgres has an array type that's also a first-class citizen with plenty of tooling around it.

The "this" was referring to dynamically building queries (the GP comment by me) in Ecto (the parent comment by QuinnWilton). What you've said is a non-sequitur in the context of this little discussion. My whole original point is that raw SQL isn't right in all situations, and you appear to be arguing that I just use SQL instead.
I can't speak to every ORM or database interface in existence but ActiveRecord will happily handle Postgres arrays and let you use the built-in array functions just handily without having to write queries by hand. Ecto is less elegant, but you can still finangle some arrays with it.

As far as views are concerned, I don't know what to tell you. Sure, you'll probably have to craft the view itself by hand. The result is that you can then use most abstractions of your choosing on top of it though.

There's also the possibility of using automation to create, update, and manage views. That lets your app be 'dynamic' with regards to new data and new datatypes, but also preserves the performance, debugging, segregation, and maintenance benefits of the underlying DB.
>For example, can it...

Yes. There will be a subquery and jsonb indexes need to be thought out in order to make it fast

> Across all the different means of interacting with a database I have experience with (from full-fledged ORMs like ActiveRecord, to sprocs in ASP.NET), I've found that it offers the best compromise between providing an ergonomic abstraction over the database, and not hiding all of the nitty-gritty details you need to worry about in order to write performant queries or use database-specific features like triggers or window functions.

Ahh Elixir. My favorite language that really just tries so hard to shoot itself in the foot. I'm currently in the protracted process of trying to upgrade a Phoenix app to the current versions. Currently I'm at the rewrite it in Rust and try out Rocket + Diesel stage.

Diesel is... interesting and makes me long for Ecto (which is often used as an ORM although the model bits got split off into a different project).

Love the downvotes instead of comments. I've walked away from Elixir as the best practice deployment methodology (Distillery) is non-op on FreeBSD[1] and has been for a few months while the Distillery author is mum. All of this despite the vast love that the Elixir community seems to heap on FreeBSD.

Erlang and Elixir have plenty of promise but there simply is no good story for production deployments. Distillery and edeliver approximate capistrano, and that sounds great when it works (although I'd just as soon skip edeliver). But when it doesn't I'd much rather dig into the mess of ruby that is Capistrano than the mess of shell scripts, erlang, and god knows what else goes into a Distillery release.

Elixir is a really interesting language, but Phoenix seems to still be pretty wet behind the ears and very much in flux. Ecto too to a much smaller extent.

1: Some of the distillery scripts can communicate with epmd, some just give up.

Well... you can also use a modern ORM. I think "stitching ... text strings" is definitively not the way to go when interfacing a SQL database. My go-to ORM is Sequel[1]. I think their API is one of the best I've seen: you can choose to use models, but you can also work directly with "datasets" (tables or views, or queries) and compose them as you like. It's really powerful and simple.

[1]: http://sequel.jeremyevans.net/

> genuinely better than stitching together fragments of SQL in the form of text strings. Injection attacks and inserting commas (but not trailing commas) come to mind as obvious difficulties.

You're using the Pymongo library as an example. Someone can just as easily use SQLAlchemy and not have to worry about those things.

I'm confused by the implication that someone doing things like the above would be writing in SQL. SQL is a little like assembly language in a game: You may need to drop down to it for some key highly-optimized areas, but you rarely need to directly use it for most tasks. While it's true that you should understand how it works so you don't generate queries that suck performance-wise, the same goes for Mongo's intricacies too.

Every language I know of has great ORMs which do this for whatever SQL flavors people tend to use on that platform. I write things like this all the time, and it gets turned into SQL for Postgres:

```` Article.where(author_id: 37).order(:modified_date, :desc).where.not(published: false) ````

When using an ORM correctly (and indeed, the less I'm using any of my own bits of SQL the more this is true) I am also protected against injection attacks.

I'm not saying NoSQL has no value, but I believe it to be the wrong tool for data that lends itself to an RDBMS. If you have a bunch of documents who have deeply nested or inconsistent structures and where it makes no sense that you'd want to query by something other than the primary key, sure, it's a no-brainer to use a NoSQL system. For a CMS, which has been implemented thousands of times in RDBMSs, it is madness though. I cringe at realizing that apaprently there are developers out there who have avoided learning SQL entirely in their career out of fear, and as a result have to use Mongo for every application because that's the only thing they know how to do. I'm sure they're out there, but I wouldn't hire one.

I’ll try to avoid a flame war, but since you’re using python, SQLAlchemy allows for composing sql strings.
Yes, an ORM, not SQL itself.
You can compose queries, using the queryAPI in SQLAlchemy, without touching the ORM.
SQLAlchemy is not an ORM. There’s a companion ORM project if you want it, but it’s not necessary.
Yes? What's the qualitative difference between using a best-of-breed SQL ORM and the Mongo API?
For it to replace MongoDB's aggregation pipeline, it would need to play nicely with JSONB. Does it do that? This is the thing I'm really missing.

For example, if documents in the JSONB column all look roughly like this:

    {
        "someArrayField": [
            { "key": "steve", "value": 7 },
            { "key": "bob", "value": 15 },
        ],
        "someOtherField": [ "whatever" ]
    }

* Can I count the number of entries in someArrayField, summed across all records?

* Can I get the per-record mean of the "value" sub-field, summed across all records?

* Can I filter by records that have a "someArrayField" entry where "key" is "steve" and "value" is at least 10 (the above record should NOT match)?

Yes, you can!

The jsonb_array_elements function is roughly similar to Mongo’s $unwind pipeline op. It explodes a JSON array into a set of rows. From there it’s pretty simple aggregates to achieve what you’re looking for.

I was evaluating Mongo a couple months back to solve roughly the same problems. Eventually discovered Postgres already had what I was looking for.

More the point Postgres has an actual array data type (and has for a while). You don't need to shove everything into a JSON/JSONB blob unless you absolutely cannot have any sort of schema.
Not only arrays, you can, with some limitations, create proper types with field names, if your ORM supports that you should use that over JSONB if it fits.
See also: PostGIS.
Allow me to restate this question:

> Does it do that?

It was supposed to be clear from the context that this meant:

> Does building queries programmatically with SQLAlchemy do that?

Maybe I'm misreading your comment, but you seem to just be talking about writing queries directly in SQL.

If not, could you give an example/link of how to programmically build a query in SQLAlchemy that dynamically makes use of jsonb_array_elements? It would be hugely useful if I could do that.

There are some old examples of how to use jsonb_array_elemens in SQLAlchemy here: https://github.com/sqlalchemy/sqlalchemy/issues/3566#issueco...
I was speaking of SQL, but if you can write it in SQL you can usually map it to SQLAlchemy. If worse comes to worse, you can use text() to drop down to raw SQL for just a portion of the query.

SQLAlchemy’s Postgres JSONB type allows subscription, so you can do Model.col[‘arrayfield’]. You can also manually invoke the operator with Model.col.op(‘->’)(‘arrayfield’).

So you should be able to do something like:

func.sum(func.jsonb_array_elements(Model.col.op(‘->’)(‘arrayfield’)).op(‘->’)(‘val’))

(Writing on my mobile without reference, so may not be fully accurate)

It absolutely can, but in my experience, 99% of the time, choosing to make a data field JSON/JSONB ends up being a mistake.
To add to the pile of responses: in Scala, Slick is great library that lets you compose sql queries and fragments of queries quite effectively. (http://slick.lightbend.com/)

At my company we built a UI on top of Slick that lets users of our web app define complex triggers based on dynamic fields and conditions which are translated to type-safe SQL queries.

QueryDSL and jOOq as well for java.