Hacker News new | ask | show | jobs
by pyalot2 4489 days ago
Sure for simple things, of which there are many, ORMs can work sufficiently up to a point.

But, the problem is that, you rarely get to go to your boss and say, well, this thing we've been using to do, you know, our data stuff, it's completely wrong. We need to throw everything away and start from scratch.

The only situation in which you get to say that, is when you're explicitly make it clear that you're writing a raw prototype that'll not scale, and that nothing short of a complete rewrite is gonna be required as soon as you finish (and even then, it can turn out tricky to actually do that).

Unfortunately, most people who get themselves into the ORM mess don't realize that this coming for them in the future. So they don't go their boss and make the prototype proposal. This all but cements the entire failure of the project right then.

2 comments

I don't know my main experience is with the Django ORM. Others have mentioned that some ORMs seem to start from the relational model, and transform those into objects (which sounds like how the Django ORM works to me).

I find this works well. It speeds up development of all the simple stuff , and by the time it gets complicated, you can drop down to SQL.

I don't find the two ways of thinking incompatible, in fact the "extra" method on Djangos ORM is an intermediate where you let the ORM do its normal simple style of mapping, but allows you to add extra clauses to the SQL manually. Once you do this you see how the ORM works, and can choose quite easily when it is appropriate to use it.

You loose a bit of the magic that comes with Django by the time your queries get more complex (like auto generated admin pages), but the benefits outweigh the costs immensely in my case.

I am in the same boat as you currently. I work with Django almost exclusively these days and use the ORM in every project. The difference is that I am working with mostly normalized databases and datasets in the 100's of megabytes. This is a great place to use an ORM. You get all the simplicity and quick prototyping and pay very few penalties. When you have a slow query you can usually address it by adding some select_related() or prefetch_related() and you are done.

What this thread is talking about is when you have datasets that are several gigabytes in size and up, when you have many dozens of tables, all interrelated and when you need to squeeze as much performance out of your hardware as you can. When your code gets complex enough where you cannot just insert an object, but have to do all manner of verification through half the DB first. Or where you need to operate on 100,000 rows in an intelligent manner very fast (think using stored procedures and complex query parameters). This is a place that you might end up and if your entire codebase depends on an ORM, you are often out of luck.

In the long run, I think there several things to do: (1) know SQL well, (2) use an ORM when you know the project will be small, and (3) anticipate that if you are growing and using an ORM, you will need a re-write.

I agree, and my application is pretty much what you describe in your last paragraph.

80% of the queries are simple and use the ORM directly. A few need an extra clause for stuff that isn't handled by the ORM, and I have a couple that need to be high performance (and are abstract enough that they don't fit well with an ORM), which are done via the raw SQL.

I completely agree with your premise, don't get me wrong. I think the ORM is a great way to do a small project that you know won't grow big (there are plenty more of these than big projects and most of the time you can tell. For example, a small-ish e-commerce site I was involved in last year).

My experience was somewhat different in terms of ditching SQLAlchemy: the performance started to suffer as we kept adding clients. Also, developers started tripping over their own feet to do simple things as relationships got much more complex and the ORM did not allow us to impose certain rules easily (data did not always flow through the ORM as it was due to historical and performance reasons). Instead, a co-worker and I sat down and wrote a small library that did the common things for you. It was broken up into modules and we standardized on some specific methods for each "object": get, find, insert, delete, update. Each object also had non-standard more interesting methods that allowed us to manipulate relationships in a controlled fashion. The management did not mind this as it took very little time, was a gradual transition (we converted performance critical code first, then anytime we touched anything else).

Some things I learned from this:

1. Creating and maintaining a library like this is actually very little work. Who cares if it's raw SQL if it's super simple and you only write it once. Something like "SELECT * FROM vehicle WHERE id = ?" does not need unit testing, etc. and almost never changes. On the other hand the crazy complex queries we had were actually simpler in raw SQL than trying to express them in the ORM. To use an ORM efficiently you must know SQL already, except you can usually write better, clearer SQL than an ORM can.

2. The benefits in performance were huge. We could do all manner of dirty hacks inside the library functions since the library was self-contained.

3. Migrations were a pain, until we created a simple text file where we would append the latest migration. I used a tool that compared schemas to verify that the live database was in line with what a blank initialization of the DB looked like. This could definitely use some better tooling but practically this was a small pain.

4. The benefits from code that used this library were even bigger than the performance benefits. Other developers on the team were locked into what that could do with the data. If they needed to create a new method for a given table/object, the went ahead, but anything complex typically got reviewed by at least one other person and bugs were caught early. Moreover, all the data manipulation now happened in one single library so it was much harder to introduce inconsistencies.

5. We got to actually use our RDBMS. ORM's try to give you the lowest common denominator between the ORM's they support. They don't typically let you do views, sharding/partitioning (in the RDBMS sense, not in the application), SQL functions/procedures, etc. We got to do all of this and it often got to be the better solution.