Hacker News new | ask | show | jobs
by marcosdumay 843 days ago
And yeah, you are just reinforcing the GP's point, because all of those are atrocious.

I am at the ORMs are bad club, but if you try any of the systems that encode the logic paradigm into their interface, they are a completely different kind of beast. (MS has brought some of it into C# with linq so you can use with the Entity Framework. It's not a fully logic system, and has so many attrition points with the Entity Framework that it's mostly not useful at all.)

1 comments

I'm very familiar with Linq and Linq to SQL. It's as close as any ORM got to being "good enough" and was abandoned by Microsoft very early in its lifecycle. Linq in Entity Framework is a minefield of N+1's and excessive eager loading.
The N+1 problem is solved in iommi (again, I am one of the authors) by simply:

1. keeping a log of queries per request with stack traces

2. at the end, checking if there are N+1 problems

3. print a big warning with example SQL and stack trace to the console

During dev if I miss a prefetch_related/select_related the console will start to print gigantic stack traces telling me where the problem is. I don't have these problems in production because I get alerted to them in dev.

These are not only solvable problems but EASILY solvable problems!

(Also, in iommi tables, most of the times it can automatically figure out and do the prefetch/select related for you, further diminishing the problem)

Can you explain what an N+1 problem is?
You have a query like:

select * from book where author_id = 5

If you represent your data as objects, you'll create a Book class with an attribute of type Author. If you now want to run the query above, you'd say (in c# likelihood, but not a real Entity query):

Database.Set<Book>().filter(author.id = 5).all()

But that instructs the ORM to fetch the author attribute from all books, and only then filter it by id. So, you will end up running the following query:

select * from book join author on book.author_id = author.id where author.id = 5

There is no reasonable way to represent the difference between this query and the one on top with an object representation. And even though both have exactly the same semantics, databases have been historically bad at optimizing them so the second one can be orders of magnitude slower.

EF Core compiles a query for the entirety of the expression provided. I wonder if it actually outputs select * from book join author on book.author_id = author.id where author.id = 5 as you described - this seems like a pit of failure a framework would work hard to prevent from happening.
Example:

You grab all Albums. Then you make a table of the albums with name and artist name. The artist name is in a separate table linked with a foreign key.

So the code might be:

    for album in Album.objects.all():
        print(album.name, album.artist.name)
Django will need to query from the foreign key ID (album.artist_id) into the Artist table to get the name of the artist. This means every loop step does a query.

So "N+1" because it's N queries (every album) plus 1 (the original "all albums" query).

The fix in Django is to do `Album.objects.all().select_related('artist')`. Then there will be just one query.

In EF line expression you can easily specify which navigation should be Eager loaded.

var customersWithOrderDetail = context.Customers.Include("Orders").ToList();

Would generate :

SELECT * FROM Customers JOIN Orders ON Customers.Id = Orders.CustomerId

Imo the issue isn't that it's hard or easy to fix. The issue is that you might not be aware that you need to fix it.

The awareness is the key imo. That's what iommi's tool gives you out of the box.

Thankfully, not an issue as of today. EF Core has client-side evaluation disabled by default. It was an enormous footgun in EF so to bring that behavior back you have to explicitly opt into it.