Hacker News new | ask | show | jobs
by tomnipotent 2690 days ago
> do more to expose what the cost of various operations are

Both Postgres & SQL Server support EXPLAIN on DML without running the statement (with varying levels of depth).

    CREATE TEMPORARY TABLE tmp_blah (num INT);
    EXPLAIN INSERT INTO tmp_blah SELECT * FROM generate_series(1, 100000) ORDER BY random();

    QUERY PLAN
    Insert on tmp_blah  (cost=62.33..74.83 rows=1000 width=4)
        ->  Subquery Scan on "*SELECT*"  (cost=62.33..74.83 rows=1000 width=4)
            ->  Sort  (cost=62.33..64.83 rows=1000 width=12)
                Sort Key: (random())
                ->  Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=12)
1 comments

One challenge with that is that the query plan often depends on the current table composition statistics. If a value is relatively common (or uncommon), or the size of the table has grown or shrunk, you may end up with a materially different (and conceivably substantially worse) plan.
I can't tell you how many times over 20 years I've heard a DBA tell me "the statistics weren't updated" after an incident.
I've written cron jobs to update mysql statistics to prevent it from choosing bad query plans. It's as terrible as it sounds.
How come it's not automatic and built-in? Why is a script terrible? (Does it have to enumerate all tables and piecewise run the stat update?)
Yes, the stats update is per table. We only updated it for a few tables we really cared about.
This is my reason for preferring NOSQL where possible.