Hacker News new | ask | show | jobs
by londons_explore 2428 days ago
For anyone interested in implementing this, here is what I found last time...

Postgres queries are streamed to the client - ie. some results are delivered before the query is done running. That functionality is necessary for really big resultsets.

That makes it difficult to change plans mid query, because your new plan might return results in a different order, and you need to filter any already-returned results, but you can't afford to keep all of the already-returned results in RAM. Even if that weren't an issue, I'm not even sure that it's always valid to do this.

To add even more complexity... The postgres protocol allows the client to reverse the query (ie. midway through getting the results, the client can say "yo, go back, and return results from earlier again"). It must return the same results in reverse order. That means if you do switch query plans, when the client goes backwards, the server needs to un-switch query plans back to the old plan when going backwards.

These issues are not insurmountable... But they certainly stopped me implementing it in the day I had set aside for the task...

1 comments

I would love to be able to SET multiple_planning; on a cursor to be able to say “yes I know that reversing would be UB, and I have set a limit so I don’t care if results aren’t streamed, so please just try everything you can.” In general I wish there were better ways to tell these systems you know what you’re doing.
There could also be a “clean the slate” streaming directive, which means “I started the query again, please scratch what got delivered already”.
That effictively means the client cannot actually start processing the results before everything is received. Because you would have to undo everything you did with the data you received before. So basically this would simply be a non streaming version.
I believe the best you can do is to delay streaming. Now you have the problem of deciding when to start...