|
[I wrote this mostly imagining the idea was about converting the entire Postgresql service to a single monolithic process. I'm not a fan so far. If is actually around coalescing like processes down to a single multithreaded process, that's more reasonable but still comes at a future cost - and whether pointful is still a question] Converting code into multithreaded code tends to make it harder to test and debug FOREVER, as well as being more limited by default for certain system resources than a multi-process solution. Viewing and managing threads from the outside is harder, and killing a rogue thread is much more likely to crash a MT solution than killing a process in a typical resilient MP solution. Above all else, I need a database to be utterly reliable (or as close to it as possible) - including being able to back off in a mature fashion in cases of memory exhaustion (I have overcommit disabled to restore classical memory handling, i.e. malloc() can fail), and file system exhaustion. MT throws a wrench through most of the workings of a complex program, and unless some specific gain can be identified that compensates for adding complexity and fragility to virtually any change going forward, then... um... why? I read a bit "well, the other guys are doing it" handwaving: "Other large projects have gone through this transition.
It's not easy, but it's a lot easier now than it was
10 years ago. The platform and compiler support is
there now, all libraries have thread-safe
interfaces, etc."
But that isn't a functional gain. And: "I don't expect you or others to buy into any
particular code change at this point, or to
contribute time into it. Just to accept that it's a
worthwhile goal. If the implementation turns out to
be a disaster, then it won't be accepted, of course.
But I'm optimistic."
But this is NOT a worthwhile goal. Fun, perhaps. Diverting or challenging, perhaps. A disaster, quite possibly. But without identifying a goal that can only be achieved by walking into the multithreading pit, the project is a waste of time for end users. Possibly a growth experience for the experimenters, regardless of whether successful. |
The big functional gain would be better connection handling. The current process-per-connection model has overhead and it's pretty common to see large database instances with double-digit max connection limits. Because connections are expensive and in (artificially) limited supply, application developers work around the limitations with connection pooling and/or proxy services.
Theoretically, a multi-threaded postgres could easily deal with thousands of concurrent connections - not just a performance improvement but a game changer in terms of application developer UX. When connections are cheap, the application just connects when it needs to communicate, no pgbouncer or connection pools needed.
I have no idea if the multi-threading proposal here is viable, but if it can make connections easier to manage it might be worth it.