Hacker News new | ask | show | jobs
by sgarland 660 days ago
> This language is so high level that you need to come up with tricks and query analyzers in order to hack the high level query into something performant.

What? You have to understand a language to write performant code in it. That’s not a hack, that’s basic competence.

2 comments

> What? You have to understand a language to write performant code in it. That’s not a hack, that’s basic competence.

The poster is not referring to understanding the language, he/she is referring to having to guess at how to structure the query in a way that increases the chances that a good plan is within the constrained search space of plans (due to it being a combinatorial problem and the optimizer has limited time and information).

No it's not. Many language are designed so certain performance aspects are invisible. Take golang and python: The garbage collector.

The garbage collector is something you should not think about when programming and most programmers don't even need to know it exists. You only think about it when you need to seriously optimize things.

For SQL it's a bad design choice because of what SQL is and what SQL is targeting.

SQL is a high level abstraction that automatically determines a query plan based off of high level input. The problem here is that it can choose a bad query plan. So you need to "hack" the query in order to trick the query planner into doing what you want.

It's also targeting the slowest part of the stack: Non volatile memory and IO. The slowest part of the stack should be targeted with a zero cost abstractions to maintain speed while the fastest part of the stack you can use a language like python for your web app it's fine because databases are magnitudes slower.

Point taken re: gc.

For query plans, in nearly every plan flip I’ve encountered, the root cause came down to either poor table / column statistics (because they had let the table grown too large without appropriate tuning), having an enormous amount of joins beyond the deterministic level the planner could provide, or sub-optimal queries. The latter is growing less common as planners get better and better at rewriting on the fly, but it’s always good to know how to do things optimally.

A good example is semijoins / antijoins. With modern versions of both MySQL and Postgres, you’ll probably have “WHERE foo IN (…)” turned into “WHERE EXISTS (SELECT 1…)”, but it’s better to write it the optimal way in the first place.

I will grant you that indexing can be bewildering, with its many rules, caveats, and gotchas.