Hacker News new | ask | show | jobs
by binarymax 1179 days ago
This is why I always advocated against ORMs. It’s so easy to fall into traps like this without even knowing it, and while you can work around it in some ORMs it is not obvious.

Writing SQL is not that hard, and mapping the results to a type isn’t that hard either. So with an ORM you might end up saving several hours of work up front for lots of pain later.

5 comments

> Writing SQL is not that hard, and mapping the results to a type isn’t that hard either.

But... Then you've written an ORM.

Nitpick: no, you've written a pure object mapper, that doesnt care about schema relations. This has the practical advantage of being just a data container that can be clearly serialized/deserialized, instead of a model object with a transitive database connection dependency.
I don't know what serialization/deserialization has do with this. Does the object map to database rows and there's code that moves the data back and forth? That's an object relational mapper.
mapping database rows to object structures is an object mapper. An object relational mapper also keeps track of table dependencies (such as related fields).

If you read a row from a database and generate an object whose attributes map the fields in the database and are used to retrieve the values (a data object), that is an object mapper. This means when fetching eg. user.type it will return 1 instead of the data object for the corresponding row on user_type;

If you read a row from a database, exactly like the above, but user.type returns a data object representing the related table row, that's an object relational mapper.

Regarding serialization, why does it matter? Because you need to serialize and de-serialize data objects or models if you're adding cache to eg. a service layer. Also, serialization and de-serialization are quite important when interfacing with eg. external systems - Imagine having an application-wide InvoiceModel that can be transported via REST, GRPC, kafka/json or any other format, and that is database-agnostic.

> An object relational mapper also keeps track of table dependencies (such as related fields).

that's not what "relational" means. "relational" means, "a relational database". where we are using SQL statements to deliver data to and from such a database. an ORM that does not directly interpret objects along one-to-many collections /many-to-one attributes is still an object relational mapper.

> Regarding serialization, why does it matter? Because you need to serialize and de-serialize data objects or models if you're adding cache to eg. a service layer.

that's a separate concern from object relational mapping. Do you have the notion that ORMs produce objects that aren't compatible with serialization?

> an ORM that does not directly interpret objects along one-to-many collections /many-to-one attributes is still an object relational mapper.

This is diluting the term ORM beyond any usable definition.

A bit like when people claim that any programming language where you can associate a function with data and have the function implicitly get passed a reference to the data through data.function() call syntax is an OO language.

First off, I question your definition of "object". Is a C struct an "object"? Is a python tuple an "object"? I know that in python it literally is an object, but so is a function. But obviously your definition of an object must be separate from any given language.

To that end, does python's sqlite3 interface constitute an ORM? It converts the results of SQL queries to tuples. What about if I change the row_factory to sqlite3.Row? Is it now an ORM?

Where do you draw the line between something which is and isn't an ORM?

What is the point of such a vague definition?

What isn't an ORM in this case?

> that's not what "relational" means. "relational" means, "a relational database". where we are using SQL statements to deliver data to and from such a database. an ORM that does not directly interpret objects along one-to-many collections /many-to-one attributes is still an object relational mapper.

It doesn't. It means it works as a virtual object database that honors relationships on the underlying schema. The fact that the schema is relational is an implementation detail. Many ORMs will work with non-relational databases, but will honor underlying relations transparently if they exist, such as eg. SqlAlchemy on MongoDB.

An object mapper isn't even a "database-related" technique. Its just a mechanism to map data into objects - ex. JSON deserialization into object attributes is object mapping. When used in conjunction with traditional database mechanisms (such as eg. repository pattern and query builders), can be used to minimize data conversions between systems.

> that's a separate concern from object relational mapping.

Well, it is because in Django ORM or SqlAlchemy you use differerent declarations of data structures for data received from requests, data passed around and data read from and written to the database. It is actually a very leaky implementation where eg. when designing models you feel like you're working on a business object representation, but in fact you're still catering to implementation details of the relational model, such as primary keys, nullable fields, indexes and so on and so on.

> Do you have the notion that ORMs produce objects that aren't compatible with serialization?

That is precisely the point. This is not a feature, it is a quite important limitation. If you use data objects, this issue (as well as complex mapping routines) goes away, because then these objects only hold data and not business logic. These concepts aren't really mainstream in python, but are bread and butter in many other languages.

So many people say this, but I have to imagine that such people haven't actually really tried to write SQL in their services and avoid the temptation to introduce all sorts of abstractions.

Because in my experience this doesn't happen. You just have to be OK with some duplication (which most engineers over-focus on to the point of serious detriment).

In my experience I did what you describe above for about ten years then I wrote SQLAlchemy, tired of writing the same code over and over again (and successive projects certainly gained more and more abstraction as the constant repetition became more obviously a huge waste of time and verbosity). So that's one example.
I wouldn't call just mapping sql result to struct as ORM.

I'd draw a line at generating queries that are more trivial than "get a record by field" as ORM. Below that is just syntatic sugar over raw SQL.

It's extremely easy to write SQL queries that take a lot longer than they look like - an unindexed full table scan looks exactly the same as an indexed join, for example, whereas in a good ORM they will look different. So as far as I can see writing SQL manually is just extra drudgery for no real gain.
I don't know.

Hand-written SQL means a bunch of work and you have to know SQL. But it is simple and transparent.

ORMs automate a lot of simple and repetitive SQL, but to use them effectively you really have to know SQL extremely well and understand the ORM deeply as well.

So I guess it depends on what you are doing. ORMs can be useful but they require a lot more knowledge to use effectively than hand-coded SQL does.

> ORMs automate a lot of simple and repetitive SQL, but to use them effectively you really have to know SQL extremely well and understand the ORM deeply as well.

I'm not sure. You need to know the relational model pretty well, but you don't have to remember the zillions of quirks and edge cases or differences between dialects that SQL has. IME that's what takes most of the memorization effort.

> an unindexed full table scan looks exactly the same as an indexed join, for example, whereas in a good ORM they will look different

ORM queries compile down to SQL, so how would they look different?

Typescript compiles down to Javascript but an unchecked cast looks different from a known-safe assignment in Typescript even though they look the same in Javascript.
We're talking about something that exists at the query planner level, not a new feature introduced on top of that. Whether to use an index or a table scan isn't chosen by the user, it's chosen by the engine when the SQL is run. ORMs don't have any special hook to "look different" - if the database engine wants to do a full table scan, it'll do it whether the query came from manual SQL or an ORM, because they look the same to the database.
Depending on the database, the ORM may well add hints for the planner (of course it's theoretically possible to do that for manual queries as well, but it's usually verbose and database-specific). At a minimum the ORM has a model of what indices exist and so it can know which queries "should" be using an index, and may even have an internal model of how the query planner makes those decisions; of course ultimately if the query planner is perverse enough then even an automated system may not be able to get it to do the right thing.
An ORM is about more than mapping results to types, in Djangos case you get a powerful DDL generator, migration management, constraint validation when saving, DB portability, ...
Automated migration management is probably the most important part of Django's ORM.
ORMs generally give you a lot of nice things, and you usually (always??) can just write pure SQL and use the models you've defined (and all those nice things).

So, use an ORM, but write SQL if you want? Sounds like a good idea, actually.

You're on the right side of the bell-curve meme my friend, but there are a lot more people in the thick-framework camp who spend their days getting lost in the complexity of ORMs and related tech...
It's more like the world is not black and white, engineering problems don't have "The perfect solution, the rest is trash", but rather "this problem has multiple solutions, depending on context, some have these tradeoffs and the others have these".

In this particular case, it might not be worth to trade speed of having to think about SQL for performance (today or tomorrow). Maybe you're building something that will just be used by 2-3 people, so 1+N isn't really a issue.

Or whatever, the conclusion as always is: it depends.

I agree it depends, but my hot take is almost universally the time people like to say ORMs save them they end up paying back in spades debugging them. Learn SQL!
I really think it is more a matter of exposure and familiarity than bell-curve positioning. If any backend engineer with 1 year of ORM experience had spent that year instead becoming familiar with SQL, the speed bump would be practically nil.