Hacker News new | ask | show | jobs
by rektide 1098 days ago
This feels a bit like you are using an image of absolute safety to hold hostages, not allowing the potential for change & improvement.

The author starts by citing a decent variety of sources to have already expressed interest here, who see this as progress.

Migrating a bunch of per-process global variables to have scope (per thread or per session) may be risky, but gee, it just sounds like vaguely reasonable architecture to have these days to me.

1 comments

You can usually find developers interested in any fashionable approach to a problem. Change is fine. What improvement, specifically, though? Adding multithreading is not a functional improvement in and of itself, but more the opposite. MT should be used when a specific, important functional gain can be realized through no other approach.

I'm not trying to win an argument or anything here, I'm just highlighting from my and others' experiences that multithreading is a tradeoff not to be made casually. It makes some things faster, especially if not I/O bound, but it also increases dev and debug cost, and reduces the number of developers who can assist. That downside tends to permanent.

That's fair. It does seem like no one else on the planet still uses multi-process architecture, that the performance has never been there.

This was the famous evolution of Apache Httpd 1 being a forking multi-process model, and in v2 gaining a new pluggable strategy system, including thread pooled models. For great scalability wins. https://httpd.apache.org/docs/2.4/mod/worker.html

Context switching between processes is just such a taxing thing to do. So many caches to reset. Especially with all the mitigations most folks run, it's such a drain.

However, that Apache situation, multithreading *like* things together (request handling), is a more reasonable act than say, turning all of PostgreSQL into a monolithic process. PostgreSQL is a much more heterogeneous system than Apache, with potentially more interesting ways to lock up than Apache with its rather simple overall mission.

Sure it sounds interesting to try a branch of pg with, for example, just the sessions being multithreaded - but then how DOES one forcibly stomp on some session that has grabbed some critical lock without crashing other users' sessions? Killing off a session's thread inside of a MT'ed session handler without putting any other threads at risk would be the first problem (and an admin is likely to use "ps -Lef" to find the thread ID and then "kill"). Many MT programs I see lose their little minds if a thread is killed from outside.

Going too crazy with threads can also cause performance issues, since there is overhead - just less than for processes - around thread creation/switching/etc, and is why thread pools are common. There's a short article about this at:

    https://stackoverflow.com/questions/5961536/what-is-best-a-single-threaded-or-a-multi-threaded-server/5964238#5964238
There's some theory about how multithreading to handle a bunch of fds versus using poll / nonblocking I/O in a singlethreaded solution being equivalent at some level in computing science, but skill sets tend to matter more in practice.

This is a pretty good page on the options in general, though dated (anyone already know of a newer equivalent to it?):

    http://www.kegel.com/c10k.html
I feel sure work has been put into making kernel support for both MT *and* the poll and nonblocking I/O models more efficient since then. :-)
> Sure it sounds interesting to try a branch of pg with, for example, just the sessions being multithreaded - but then how DOES one forcibly stomp on some session that has grabbed some critical lock without crashing other users' sessions?

Presumably as one does now, through pg_terminate_backend()/pg_cancel_backend().

> Killing off a session's thread inside of a MT'ed session handler without putting any other threads at risk would be the first problem (and an admin is likely to use "ps -Lef" to find the thread ID and then "kill")

ps+kill already puts all of Postgres' processes at risk in Postgres' MP system, because processes that unexpectedly exit may have corrupted shared state, so in those situations PG restarts. MT would not significantly change that.

> Going too crazy with threads can also cause performance issues, since there is overhead - just less than for processes - around thread creation/switching/etc, and is why thread pools are common.

(emphasis mine)

Considering that PostgreSQL currently is a multi-process architecture, surely replacing the Process primitive with the Thread primitive will reduce the overhead of connection backends, all else being equal.