Hacker News new | ask | show | jobs
by bunderbunder 528 days ago
> ... and this is even less obvious if you’re used to using ORMs.

Which is why I continue to be such an ORM skeptic. I agree that they're convenient. But I do worry that we've now got an entire generation of engineers who regularly interact with relational databases, but have largely been spared the effort of learning how they actually work.

As another commenter pointed out, if you've learned basic relational algebra then the way SQL nulls behave seems obvious and logically consistent. The logic is the same as the logic behind the comparison rules for NaN in IEEE floats. It's the behavior of C-style nulls that is, always and forever, a billion-dollar mistake.

4 comments

Stop thinking of ORMs as trying to hide the details of SQL and you'll stop hating them. Instead think of them as a way to compose relational queries dynamically, with the full power of your primary language, instead of inside of database stored procedures in a language totally devoid of any support for basic software engineering best practices. They shouldn't be hiding SQL from your primary language, they should be exposing the relational model to it! SQL is not the only possible implementation of the relational model, and it's not even a particularly good one. Even SQL's founders don't think it implements EF Codd's relational model very faithfully. Good ORMs act as a domain-specific language for the relational model embedded inside the parent language.
"compose relational queries dynamically"

That's an important one. It would be super nice to have a SQL dialect that works more like LINQ where you can compose your queries easily. I always hate it when I have to write SQL directly. It's super powerful but the syntax just isn't designed well. To me it feels like a throwback to the good old FORTRAN or COBOL days: you can get stuff done but modern languages are so much better.

>They shouldn't be hiding SQL from your primary language, they should be exposing the relational model to it!

But this has never been their primary purpose and it's not what they are good at. ORMs are supposed to map the relational model into an object oriented model so that you can work with objects rather than sets of tuples. And that's exactly how people use them.

ORMs incentivise people to replace simple and declarative set operations with complex procedural code operating on individual objects.

ORMs are just a terrible idea - conceptually messy, hard to debug and optimise, full of needless complexity.

Exposing the relational model to be manipulated dynamically within the parent language is exactly what LINQ in C# is. That was its primary purpose. True, LINQ is not itself an ORM -- it was built to support other ORMs like LINQ to SQL and Entity Framework, which aren't as "pure" on this subject. I don't actually like the LINQ syntax that much since it's not as extensible, but its existence is proof that the C# team did in fact intend to expose the relational model to C#.

Entity Framework did try to cater to the "SQL is scary, let me use objects" crowd, and that is the majority of how it's used, and that is a mistake in my opinion. But it is also very good at supporting relational algebra within C# and composing queries dynamically; ironically, it's best at it if you disable or avoid many of its features (dynamic subclassing, linked objects). Pass IQueryables around and compose them together and life is good. Updating needs work, but updates have always been a weakness for SQL too.

> ORMs are just a terrible idea - conceptually messy, hard to debug and optimise, full of needless complexity.

and that's why ORMs are so unpopular and entirely absent from successful production applications for the past 30 years

All ideas that were popular for a while are basically guaranteed to be in some successful applications. That includes bad ideas.
Are you sure about this?
yeah I mean, isn't all of this true for ANY abstraction? Once you're off the beaten path, they're all hard to debug and optimize, they introduce extra complexity, etc. BECAUSE they are attempting to abstract away certain details

This is true for an HTTP library as much as it is an ORM.

>yeah I mean, isn't all of this true for ANY abstraction?

No, it is only true for bad abstractions.

My experience with ORMs is that most of the time you end up needing to write some queries by hand, in raw SQL. Usually these are the most complex, that you can't express in your ORM's DSL. My point being, I don't think using an ORM really shields you from having to learn how it works behind the scenes.
It's not even about having to write SQL by hand. In an ORM like Django that's exceedingly rare. But you still need to understand what's going on underneath. In other words, it's the most leaky abstraction there is. I think the popularity is mostly aesthetic and convenience. Most people into ORMs like Django don't really know about layered architecture and that you can keep all your SQL in one place in the data access layer. They just scatter that stuff everywhere in the codebase.
I don't know Django specifically but I'm always floored by how people talk about ORMs. They're only a leaky abstraction if you believe their point is to shield terrified junior devs of the inner workings of the scary relational database. That's an awful way to use ORMs, and the source of most of the flak they get. To be fair, some are designed that way, or at least strongly push you toward it.

Stop thinking of ORMs as trying to hide the details of SQL and you'll stop hating them. Instead think of them as a way to compose SQL dynamically, with the full power of your language. SQL is an awful language to write application logic in, because it has horrible support for abstraction, composition, encapsulation, dependency injection, etc. The ORM gives you a way to produce SQL in an environment that actually supports basic software engineering principles. Scattering ORM logic everywhere in the codebase is the point: putting all your SQL in one data access layer is like putting all your arithmetic in one calculation layer. Why would you ever do that? What's wrong with seeing a plus sign in more than one file? What's wrong with seeing language-encoded relational logic in more than one file?

I can guarantee you the popularity is not "aesthetic". And convenience is a real thing that actually does reduce costs. People complain about ORMs, but have you seen the absolute horse-shit-level code that people jam into SQL functions and procedures to do the utterly most basic things? The standard for what ends up in SQL Stored procedures is the most unmaintainable garbage in the entire software engineering ecosystem.

Are you thinking of something like LINQ or SQLAlchemy Core? You do not need to use an ORM to interface with a SQL database with basic software engineering principles.

The reason for a data layer is because the underlying data representation might change. For example, you might change the schema to handle some new performance requirement. Now you have to hunt down and change everywhere you've queried that table in your whole code base. Every time you directly call SQL you are coupling your code strongly to the database schema which is ultimately an implementation detail.

> And convenience is a real thing that actually does reduce costs.

I know convenience is a real thing. I also know that it very often increases costs in the long run.

> Are you thinking of something like LINQ or SQLAlchemy Core? You do not need to use an ORM to interface with a SQL database with basic software engineering principles.

Well you certainly can't do it all in SQL, because SQL doesn't support basic software engineering principles at all. That means you're picking a parent language that does support the principles you want, and attempting to build a library that allows you to build, pass around, and compose relational queries in a way that leverages the benefits of the parent language. To do this I'd argue you need these things:

1. a structured representation of the (possibly incomplete) relational query as it's being built, e.g. SQLAlchemy's internal representation or IQueryable in C#

2. a method of building those representations in a way that conforms to the norms of the parent language and leverages its benefits as much as possible, e.g. LINQ or direct use of the various extension methods on IQueryable. It seems like you could pick either SQLAlchemy or SQLAlchemy Core depending on whether you want to think in terms of objects or queries (I'd usually pick queries), but I'm not that familiar with SQLAlchemy.

3. a method of translating the query representation into a particular SQL dialect.

I don't know what exactly your definition of "ORM" is, but I'd argue that if any of those steps involve an object-level representation of the schema of your tables (e.g. "class Employee { int ID }" type of thing) then it's an ORM. Do you need that? Well, no, probably not, but why not? C# is strongly typed, so why wouldn't I tell its type system what the schema of my table is? That's part of leveraging the benefits of the parent language.

> The reason for a data layer is because the underlying data representation might change. For example, you might change the schema to handle some new performance requirement. Now you have to hunt down and change everywhere you've queried that table in your whole code base.

This is an argument for strong typing, not for putting all your code that has the word "select" in it in the same file. And building queries with a strongly typed representation of your schema in the parent language is the point of the ORM!

Without an ORM, you still have to hunt down tons of spots where things change when you change your schema. How do you ever not have to do this?

> Every time you directly call SQL you are coupling your code strongly to the database schema which is ultimately an implementation detail.

There exists some canonical relational model of your domain. This model is not an implementation detail, it is the foundation of most of your code. I choose to keep my SQL schema as close as possible to this canonical relational model as I can. The fact that a Monitoring Instrument has an Operating Status is a fundamental fact about my model; whether that column is called "operating_status" or "OperatingStatus" or "operating_status_ID" is an implementation detail. It's just hard to agree with you that "the database schema" is merely implementation detail -- clearly parts of it are not, unless it's completely untethered from your canonical domain model (which would be a nightmare, of course). Of course your code is strongly coupled to the fundamental domain model upon which it is operating. I'd certainly agree that it'd be nice to not have to worry about implementation details like the exact naming convention of the columns -- which, of course, ORMs let you do.

>> And convenience is a real thing that actually does reduce costs.

> I know convenience is a real thing. I also know that it very often increases costs in the long run.

Code that is easier to write is generally also easier to read and maintain. That's what I mean when I say "convenience". Simplicity and convenience go together.

> I don't know Django specifically but I'm always floored by how people talk about ORMs.

Django seems to be an outlier on the side of "an ORM that's actually good". Whenever people have specific technical complaints about ORMs, it's generally not a problem or there's already a solution in Django you just have to use. It's usually only when you get to the more conceptual stuff like object-relational mismatch that such complaints tend to apply to Django.

If you can avoid learning SQL or the underlying DBMS then great, I have no problem with people ignoring things that are ignore-able. Problem is you can't. You will need to learn the DBMS, and the ORM will end up just getting in the way of what you really want to do.
I've used ORMs extensively in my career, and I've seen it trotted out as a Boogeyman here a million times. Why do I still prefer writing queries with a good ORM over awkwardly composing SQL by conditionally concatenating strings? Is buggy, error prone string concatenation with a bunch of if statements the true way? What am I missing?
Conditional concat should be pretty rare. When are you doing that?
When I need a new where clause or a new order by, or I have to stop using an IN filter because you can't use an empty tuple. Stuff like that.
Yeah, if you're going to conditionally add filters, this is mostly the only way with SQL. There are a few tricks depending on the dialect, like =ANY() is more convenient than IN, or COALESCE can avoid some dynamic WHEREs.

You're not really missing anything other than query builders, which are worse for other reasons.

I was actually going to comment on how breathtakingly ignorant the blog post author was for making this statement. What on earth ORM would make a UNIQUE constraint against multiple columns without an explicit instruction, much less where some of the columns are NULLable and then leave the user entirely unaware that anything like that was happening? ORMs do nothing like this and the developer would always have built this constraint explicitly and therefore is working at the DDL layer. ORMs have nothing to do with whether or not someone knows SQL just like cars have nothing to do with whether or not someone knows how to walk.