| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tristor 754 days ago

SQL is easy to understand from the perspective of syntax, but quickly becomes challenging to understand depending on the data model how to actually get what you want out of a query. I have spent many years working in SQL across different server products (PostgreSQL, MySQL, Microsoft SQL Server, and others), and I still sometimes find myself struggling to build the most efficient query for task. Most of the time, it's not SQL that's the problem, it's a broken data model that preceded me (often created by an ORM) that I am forced to work with.

All that said, while it may be painful sometimes, learning it can at least let you poke your ORM more towards the right direction, or give you an opportunity to identify the hot spots where it's worth investing the effort to write a more efficient query directly, much like where sometimes it makes sense to write some subset of an application in assembly for performance, while most of the time you're better off just letting the compiler optimize for you.

For the problem you're talking about above, you're trying to avoid a Cartesian explosion, which is probably best handled through query splitting as you mentioned, but it depends on the database engine how best to approach that. For all its warts, Microsoft SQL Server is pretty good about things like the ability to use query splitting within the context of a single transaction to make it serializable and ensure data consistency (although that can come with its own challenges for performance).

The example you provided is nearly exactly the case where I would expect an ORM to produce a bad query, where-as a direct query build would be significantly more performant because you can incorporate a clear understanding of the data model and use limits and ordering to minimize the size of the result set you need to deal with.