Hacker News new | ask | show | jobs
by davidrowley 856 days ago
(blog author and Postgres committer here) I personally think this would be nice to have. However, I think the part about sending of tuples to the client is even more tricky than you've implied above. It's worse because a new plan does not even guarantee that the same tuples are returned. e.g. if you wrote: SELECT * FROM table LIMIT 10, there's no ORDER BY so which tuples are returned is non-deterministic. It may be easier to do something like queue up X tuples and just start sending those out when the queue is full. When the queue is full, we can say that it's too late to replan and we're locked into the current plan. People can set X to what they want to increase the time that the plan can change at the expense of using more memory and higher latency to get the first tuple.
4 comments

Or it could work in a way that the Planner has access to data about previous runs of each query, and it can use this data to change plans that were proven bad during execution. This way, the first execution would be slow, but Planner could self-learn and better next time. SQL Server has a bunch of similar features in its query optimizer https://learn.microsoft.com/en-us/sql/relational-databases/p....

I'm not sure Postgres has infrastructure to do that, though, because it doesn't have shared plan cache, for example.

Also, many queries might be so slow they never complete, and therefore never populate the cache. (think those queries run by a data scientist with 50 JOIN's)
I'm sure there are reasons the implementation might not be easy, but conceptually this seems fixable. You just need a lower bound of "how bad can this plan be?" and that doesn't require completing the query, just observing that it's been running for a long time and/or used unreasonable amounts of resources.

Also, is the problem with the 50 join query that the planner screws it up, or that it's fundamentally doing too much?

You'd still need analyze to gather table statistics to have the planner produce plans prior to getting any feedback from the executor. So, before getting feedback, the quality of the plans needn't be worse than they are today.
> 50 JOINS

And no indexes.

For many queries, even setting X=1 would probably have big benefits. If it takes far longer than expected to find the first result, it's probably time for a new plan.

Implementing only the X=1 case would also dramatically simplify the design of such a feature. Limiting it to read only queries would also make everything much simpler.

I agree. The primary area where bad estimates bite us is estimating some path will return 1 row. When we join that Nested Loop looks like a great option. What could be faster to join to 1 row?! It just does not go well when 1 row turns into more than 1. We'd realise that the 1-row estimate we wrong by the time we got to row 2, so queuing 1 row would likely cover the majority of cases.
I have no knowledge how common queries with ORDER BY vs no ORDER BY are, but that sounds like a first implementation which only works if ORDER BY is present would be easier and still useful? Or do you think that's not common enough to justify the effort?
You have to remember that because the query has an ORDER BY, it does not mean the rows come out in a deterministic order. There'd need to be at least an ORDER BY column that provably contains unique values. Of course, you could check for that, but then I don't think that's the end of the complexity. Things like SKIP LOCKED skip over rows which we can't immediately lock. If the first time we couldn't lock the lowest order row and output the 2nd, then aborted, replanned, then next time the 1st row wasn't locked, we'd then output the rows in the wrong order. It's probably possible to figure out all these cases and not do it when there's some hazard, but it sounds very tricky and bug-prone to me.
However, at least some ORDER BY queries will have a final sorting step before the client sees the first row. (If you're sorting on a column with an index, it may be able to produce the data in sorted order; if not, though, the final sort seems unavoidable.)

Which means that the capacity to buffer up data until it's all produced is present. But it might still be awkward to make it the default.

I think the first step to making improvements in this area is to have the planner err on the side of caution more often. Today it's quite happy to join using a Nested Loop when it thinks the outer side of the join contains a single row. We have various means in the planner on how certain we might be that the 1 row thing will hold true during execution. An equality condition on a column with a unique index, is, for example, a way we could be certain of getting <= 1 row. If for example, the selectivity estimate concludes 1 row will match for some WHERE clause containing several columns with independent statistics, then the certainty level goes down. It seems silly not to swap the join order and switch to a hash join for this. Best case, we have to build a hash table to store 1 row. Probing that won't be very expensive and could even be optimised further to skip hashing if we continually probe the same bucket for N probes. The cost of additional rows over the estimated 1 row scales much more linearly than the quadratic scaling we'd have gotten with Nested Loop. I imagine a setting which controls how much risk the planner is willing to take would allow users to maintain the status quo of the current costing model. I'd imagine not many would want that, however.
SELECT x, random() FROM foo ORDER BY x;

SELECT foo.x, bar.y FROM foo WHERE bar.z > random() ORDER BY foo.x;

SELECT x FROM foo ORDER BY x; -- Meanwhile, concurrent updates are happening to foo, so re-running the query gets a different result set.

"We choose to do this thing and the others, not because they are easy, but because we ask 'how hard can it be?'"

> It may be easier to do something like queue up X tuples and just start sending those out when the queue is full. When the queue is full, we can say that it's too late to replan and we're locked into the current plan. People can set X to what they want to increase the time that the plan can change at the expense of using more memory and higher latency to get the first tuple.

Maybe warrant a new SQL syntax, like

    select * from table limit 10 queue X