Hacker News new | ask | show | jobs
by ganomi 1853 days ago
To get production EXPLAINS for problematic queries you can activate auto_explain on a postgres instance. For my transactional system i have set it up to log EXPLAINS for all queries that take more than 2000 ms.
3 comments

Auto_explain is a pretty great tool to spread knowledge on yea - I've actually built out a lot of functionality related to our DB handle where I work and one of the features I added was a software configuration to establish a threshold that could also be impacted by other runtime variables. We've used this to track specific classes of queries over time and figure out what's going wrong and it can be advantageous (if you know a query sometimes does run long) to capture explains of it executing quickly - sometimes you'll get really helpful information like the query planner changing it's mind when passing a threshold of so many rows and know clearly what you want the query planner to decide to do.

If you're a small enough shop to consider it I highly recommend setting up something to automatically explain queries meeting some criteria on production or using some analysis stack (like new relic) to just capture all the query executions within certain time windows.

These tools all come with costs and should never just run continuously on production if you're getting no benefit from them, but the value can be quite significant.

Great extension, yes. There is overhead when enabling the timing and buffers options, but sometimes it's not big [1]

But auto_explain solves only part of the task – you can see what happened, but cannot see the answers to "what if" questions. ("What if I used this index?")

[1] https://www.pgmustard.com/blog/auto-explain-overhead-with-ti...

It's good to auto explain but I would also add that going in and running explain (analyze, buffers) is really beneficial to seeing how much the query uses buffers and how many pages it has to load from disk.