Hacker News new | ask | show | jobs
by solox3 1944 days ago
Something about select_related? Please do share.
2 comments

It wasn't anything that fancy, it's just the solution was something you usually try not to do so it just wasn't coming to mind for him. The data being looped over came from solr, and some of the fields were primary keys used in lookup tables in the database, for getting translated text. Instead of doing the lookup inline, load the entire table into a python dict before the loop (<100 rows for each of these tables) and do the lookup from the dict in the loop.

And like I said above, usually you don't just select out the entire contents of a table and handle it in the application, so I reflexively did the wrong thing as well, because of how easy it was to do with Django's ORM.

My guess is accessing a related field within a loop causing a database request per iteration, e.g.

``` [book.author.name for book in Book.objects.all()] ```

That’d be my guess: prefetch_related is great but you need to guard it with something like an assertNumQueries test to avoid accidental regressions.
Maybe I spent too much time with Django already, but if I see anyone doing anything but Book.objects.values_list('author__name', flat=True) for this type of expression, I would mark it as a 3x WTF? in the code review.
As written, it's obvious you should be doing something else like `values()` or `values_list()`. You're much more likely to fall victim to this anti-pattern if it's done within a standard for-loop that has a bunch of other stuff going on. I just wrote it as a list comprehension to avoid having to muck about with formatting on my phone.
I’d also allow prefetch_related if you’re using more than the most trivial data - no point in duplicating logic you have in your models if you have a method which generates something like a name, URL, etc. based on multiple fields.
Funnily enough, I recently optimized some code along these lines.

The way I sped it up was to call `.values()` on the query, which serializes the data into a dict and prevented me from accidentally making subsueqent calls.

PS: Indent by 4 spaces for code formatting.

s/ident/indent/
Thanks :).