| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zeroimpl 1098 days ago

However, it's already the case that if a postgres process crashes, the whole cluster gets restarted. I've occasionally seen this message:

    WARNING: terminating connection because of crash of another server process
    DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
    HINT: In a moment you should be able to reconnect to the database and repeat your command.
    LOG: all server processes terminated; reinitializing

2 comments

lelanthran 1097 days ago

> However, it's already the case that if a postgres process crashes, the whole cluster gets restarted. I've occasionally seen this message:

Sure, but the blast radius of corruption is limited to that shared memory, not all the memory of all the processes. You can at least use the fact that a process has crashed to ensure that the corruption doesn't spread.

(This is why it restarts: there is no guarantee that the shared memory is valid, so the other processes are stopped before they attempt to use that potentially invalid memory)

With threads, all memory is shared memory. A single thread that crashes can make other threads data invalid before the detection of the crash.

link

niccl 1098 days ago

yes, but postmaster is still running to roll back the transaction. If you crash a single multi-threaded process, you may lose postmaster as well and then sadness would ensue

link

mattashii 1098 days ago

The threaded design wouldn't necessarily be single-process, it would just not have 1 process for every connection. Things like crash detection could still be handled in a separate process. The reason to use threading in most cases is to reduce communication and switching overhead, but for low-traffic backends like a crash handler the overhead of it being a process is quite limited - when it gets triggered context switching overhead is the least of your problems.

link

Yoric 1098 days ago

Seconded. For instance, Firefox' crash reporter has always been a separate process, even at the time Firefox was mostly single-process, single-threaded. Last time I checked, this was still the case.

link

jtc331 1098 days ago

If you read the thread you’d see the discussion includes still having e.g. postmaster as a separate process.

link

cyberax 1098 days ago

PostgreSQL can recover from abruptly aborted transactions (think "pulled the power cord") by replaying the journal. This is not going to change anyway.

link

cogman10 1098 days ago

Transaction roll back is a part of the WAL. Databases write to the disk an intent to change things, what should be changed, and a "commit" of the change when finished so that all changes happen as a unit. If the DB process is interrupted during that log write then all changes associated with that transaction are rolled back.

Threaded vs process won't affect that.

link

dfox 1098 days ago

Running the whole DBMS as a bunch of threads in single process changes how fast is the recovery from some kind of temporary inconsistency. In the ideal world, this should not happen, but in reality it does and you do not want to bring the whole thing down because of some superficial data corruption.

On the other hand, all cases of fixable corrupted data in PostgreSQL I have seen were result of somebody doing something totally dumb (rsyncing live cluster, even between architectures), while on InnoDB it seems to happen somewhat randomly without any obvious reason of somebody doing stupid things.

link

anarazel 1098 days ago

We would still have a separate process doing that part of postmaster's work.

link

tracker1 1098 days ago

You can still have a master control process separate from the client connections.

link

moonchrome 1098 days ago

Restart on crash doesn't sound that difficult to do.

link