Hacker News new | ask | show | jobs
by nothrabannosir 2603 days ago
Would that come with the benefit of not getting nasty surprises when the cantankerous scheduler suddenly decides to mess with a previously efficient solution? Or even protect me from myself in the future accidentally breaking an efficient query through an ostensibly innocent change?

I would appreciate a way to encode execution efficiency parameters in the semantics of the query itself. Specifying which indexes to use, or the runtime complexity of certain operations. This would also help me figure out which indexes I need to create, in the first place. Today I can add a clause in an order which makes using an index impossible, rendering a previously efficient query suddenly extremely expensive. The failure mode is very obscure: it all works, the DB just works overtime. I might not have a large enough DB yet to actually notice this, until one day I see these slow queries and have to backtrack all the way to that one commit that caused it, six months ago.

I have encountered this scenario a few odd times. It is not always obvious to someone changing a query that it is meant to use an index, and it is very hard for someone writing it to specify so in a unit test.

Flexibility for the scheduler to choose efficient strategies cuts both ways. I'd appreciate the ability to leave more explicit annotations about expected runtime characteristics, than mere comments offer.

2 comments

This talk ("How Modern SQL Databases Come up with Algorithms that You Would Have Never Dreamed Of by Lukas Eder" https://www.youtube.com/watch?v=wTPGW1PNy_Y) is not particularly gripping, but he does claim that modern database engines will ignore indexes if they estimate it will be faster to read all the data than it is to access the index first then read the relevant data. And that Oracle can have multiple execution plans and switch between them mid-query if the first one isn't working out as expected.

Could this be a rare case of the mythical sufficiently smart compiler having more information at runtime than the programmer has at coding time, and that if you could hint which index to use, the trade off would be that you'd instead be troubleshooting queries which end up slower because of it?

> Could this be a rare case of the mythical sufficiently smart compiler having more information at runtime than the programmer has at coding time [...]

Mythical? Isn't this the core concept behind optimizing JITs?

> I would appreciate a way to encode execution efficiency parameters in the semantics of the query itself. Specifying which indexes to use,

That's a mistake. You can in MSSQL but you'd likely regret it (this from experience). Using an index can be a mistake; sometimes a table scan can be more efficient. And which, full scan or index, is more efficient is down to the data in the table and the query predicate.

(Edit, so which to do, scan or index, can be different for exactly the same query depending on the test:

  select * 
  from people 
  where surname = ?
can be either depending on whether surname is 'smith' (vey common) or 'hepsibah' (very rare) ).

> I'd appreciate the ability to leave more explicit annotations about expected runtime characteristics, than mere comments offer.

Interesting. Can you elaborate?