Hacker News new | ask | show | jobs
by bshipp 2418 days ago
Similar to what @overcast said: initially I didn't use select_related at all. Almost immediately I saw huge DB utilization with hundreds of thousands of tuples returned for (what I assumed) were pretty simple queries. I realized, as @overcast said, that it was looping instead of asking for it all at once, so I added indexes and appended "select_related" to almost every query. Then I figured it was fixed.

Once my database hit 100GB and a few hundred million rows I had no choice but to sit down and actually learn what each of my ORM commands was asking my database to do. Sometimes I removed a select_related. Sometimes I replaced it with prefetch_related. Sometimes I eliminated an entire filter operation or moved it elsewhere. A few times I injected a greatly simplified raw SQL query instead of relying on complex ORM generated SQL. In four instances I replaced expensive join operations with periodically rebuilt "materialized views" to reduce CPU usage and DB I/O. All was timed with django-debug-toolbar and/or pghero to minimize database impacts and network congestion.

So select_related was sneaky in the sense that I thought I had solved the problem very early on, when I had merely delayed it until much later. If your database always remains small you'll likely never encounter this issue.

The solution is to not fire and forget the application, but to install something like django-debug-toolbar and monitor what your program is doing as the database grows in size. But for heaven's sake, don't worry about that problem today. Get your app working so you can make money. Once it's done, however, remember that your ORM has put a thick collar on your new puppy, and as it grows you'll need to expand that collar or you'll slowly strangle your pet.