Hacker News new | ask | show | jobs
by dagss 748 days ago
What popular SQL databases need is an option/hint to return an error instead of taking a slow query plan.

That way a lot of SQL index creation -- something considered a black art by surprisingly many -- would just be prompted by test suite failures. If you don't have the right indices, your test fails. Simple.

In this case, have TestDeleteCustomer fail, realize you need to add index, 5 minutes later done and learned something. Would be so much easier to newcomers... instead of a giant footgun and obscure lore that only becomes evident after you have a oot of data in production.

Google Data Store does this, just assumes that _of course_ you did not look to do a full table scan. Works great. Not SQL, but no reason popular SQL DBs could not have an option to have query planners throw errors at certain points instead of always making a plan no matter how bad.

SQL has a reputation for a steep learning curve and I blame this single thing -- that you get a poor plan instead of an error -- a lot for it.

3 comments

What postgres needs is some command to freeze query plans. It doesn't need it very badly, because the planner is excellent, but the need exists.

But what is a slow query plan in development has no relation at all with what would be slow to run in production. You create that error and the consequence will be a lot of spurious indexing with little practical benefit.

This seems like a very easy thing for any sort of middleware (or ORM) to do for you. Maybe even add typed where clauses that only exist for indexed columns.
I don't understand the idea ... do you mean a middleware that parse the SQL query and contains a query planner and has full knowledge of all the indices in the database .. or something else?
> What popular SQL databases need is an option/hint to return an error instead of taking a slow query plan.

You can run ANALYSE on Postgres.

I am an MS SQL user and didn't touch postgres but can I assume analyse is a tool for displaying the chosen query plan for a query?

If so it is in all DBs I would think but it is a bit too manual for my taste, still a big hurdle and opt-in...vs just writing code (including SQL code) and tests like normal and opt-out on getting errors if you get poor plans.

But...probably some tooling could be made to do such analysis automatically and throw similar errors...

Does statistics ever cause query plans to suddenly change on postgres? In MS SQL you would also need to pin the plan / disable statistics on tables...

> If so it is in all DBs I would think but it is a bit too manual for my taste, still a big hurdle and opt-in...vs just writing code (including SQL code) and tests like normal and opt-out on getting errors if you get poor plans.

The problem with that is what is considered a "poor plan" largely comes down to the indexes used, and suitability of an index is totally dependent on how it is used in the application.

Who sees the error? The DBA? The application developer? What's the cutoff for "poor".

The stats of the query allow those that know/care to make decisions. That's the one size fits all, simple, tool.

What I had in mind was the simple OLTP usecases. To compete with NoSQL in developer ergonomics.

My context is people choosing NoSQL because SQL is too hard to learn and has too many caveats; not because of performance etc

So basically a mode for the planner where you:

- throw away statistics

- consider all tables infinitely large

- ...except the log(N) of an index lookup is OK

- then find a non-infinite plan or crash

Yes, it does not work in all cases. But it avoids giving SQL a bad reputation and push people towards NoSQL DBs for the common case of simplistic OLTP queries.

The error should be seen during testing. The "DBA" sees it there are issues then you are deploying code without test coverage..

For more sophisticated queries, you would not enable this flag, and be aware that you are doing a nontrivial query.

Analyze collects statistics the query planner uses to determine the query plan. It can change the resulting plan, yes.

Production databases using different query plans sure is annoying and cause problems, but I'm not so sure whether returning errors is better. "Slow" beats "not working at all" in almost all cases. The typical case it will select a different query plan once the data grows, which is not so straight-forward to test for, especially since the hardware of your production may have quite different performance characteristics.

Pinning the plan is temping, but has the downside you risk running a bad plan because what works well for your 100k test rows may not work equally well for your 1b actual rows, and testing all of that is again tricky. That's not really a brilliant either, and may also make your application slow.

Just keeping an eye on slow query logs and/or query performance statistics is the general approach. I don't think it's really possible to improve on that without making some pretty serious trade-offs in other areas.

Note that I am ONLY talking about a mode to use for limited, trivial OLTP style queries. The kind where the query planner will never be in doubt -- if you just have the right indices in place.

The kind of simple backend software queries where people consider NoSQL instead to avoid SQL's oddities.

The mode I talk about is very inappropriate for any kind of reporting or analytics query or ad hoc queries etc.

> "Slow" beats "not working at all" in almost all cases.

In the specific context specified above, I disagree with this.

Mainly because "not working at all" will be caught during testing (because you should have test coverage of your SQL queries). Slow = undiscovered during testing.

But even assume you didn't have test coverage and the code made it to production -- yes I probably want a seldomly used "DeleteUser" API call to crash if foreign keys were missing indices, instead of doing it anyway and consume lots of DB resources.

> Just keeping an eye on slow query logs and/or query performance statistics is the general approach.

The feature I proposed was to help newbies learn SQL.

This requires expertise in the team, and easily shifts work away from the newbies in the team writing SQL (don't think properly through indices during development) to the single SQL expert in the team.

Depends a bit how your work is organized etc; I like that SQL indices etc are as closely linked to the backend development process as possible; not considered a post-optimization..

If you get the error, you can either make the index you need for a perfect and trivial query plan -- or declare "non-trivial mode" and be back to today's situation.

You don't know what the "right indexes" are, because sometimes "no index" is the "right index". Sometimes because a full table scan is faster. Sometimes because you're okay accepting the various performance trade-offs (e.g. insert speed vs. update speed, storage space on disk).

Many applications don't have tests for every single last trivial SQL query, and adding those just because the SQL server may decide to bail out because it might perhaps possibly could be 100ms slower is not a good way for most teams to spend their time.

In the end it's just trading one confusion for another confusion. But the current confusion has a lot less complexity overall, so that clearly the "better" one IMHO.

> Just to turn off the possibility of table scans on that individual table in all environments

I do not remember the name but there was a sorta pg-compatible database started a few years ago that intentionally did not support non-indexed queries and if you tried a query for which a index was available it would create it before running the query (and keep it live for some time).

I thought it was an interesting tradeoff (even if likely not the one I would choose) a signigicant lag spike followed by better performance rather than a uniform degradation over time.

...yet people flock to NoSQL to avoid the complexity of having to learn SQL. Or stick JSON into a single table in SQL, because using more than one table is too complex. And so on. (This is my context where I work, if you don't have to deal with this and people around you happily embrace and learn the footguns and complexities of SQL then good for you..)

Perhaps you like this idea better, a table-specific hint:

    create table MyTable (

    ) with (assume_infinitely_large=on)
Just to turn off the possibility of table scans on that individual table in all environments. That way you don't have to do this in ALL situations, only the ones where you know that full table scans in production will be out of the question.

> any applications don't have tests for every single last trivial SQL query, and adding those just because

Another perspective here though:

If code does not have test coverage, what really happens is the code gets tested (or "hardened") in production. And isn't it then better to have things crash straight after deploy, than to have it gradually and silently degrade as you get more users?