Hacker News new | ask | show | jobs
by wielebny 1103 days ago
Having been using and administering a lot of PostgreSQL servers, I hope they don't lose any stability over this.

I've seen (and reported) bugs that caused panics/segfaults in specific psql processes. Not just connections, also processes related to wal writing or replication. The way it's built right now, a child process can be just forced to quit and it does not affect other processes. Hopefully switching into thread won't force whole PostgreSQL to panic and shut down.

4 comments

Because of shared memory most panics and seg faults in a worker process take down the entire server already (this wasn’t always the case, but not doing so was a bug).
Most likely, the postmaster will maintain a separate process, much like today with pg, or similar to Firefox or Chrome's control process that can catch the panic'd process, cleanup and restart them. The WAL can be recovered as well if there were broken transactions in flight.
100%. Same here. There's a lot of baby in the processes, not just bathwater.

As a longstanding PG dev/DBA who doesn't know much about its internals, I would say that they should just move connection pooling into the main product.

Essentially, pgbouncer should be part of PG and should be able to manage connections with knowledge of what each connections is doing. That, plus, some sort of dynamic max connection setting based on what's actually going on.

That'll remove almost all the dev/DBA pain from separate processes.

Of course it will. That's better than continue working with damaged memory structures and unpredictable consequences. For database it's more important than ever. Imagine writing corrupted data because other thread went crazy.
You're implying that only an OS can provide memory separation between units of execution - at least in .NET AppDomains give you the same protection within a single process, so why couldn't postgres have its own such mechanism? I'd also think with a database engine shared state is not just in-memory - i.e. one process can potentially corrupt the behaviour of another by what it writes to disk, so moving to a single-process model doesn't necessarily introduce problems that could never have existed previously (but, yes, would arguably make them more likely)
No AppDomains are not as good as processes, I have tried to go that route before, you cannot stop unruly code reliably in an app domain (you must use thread.abort() which is not good) and memory can still leak in any native code used there.

The only reliable way to stop bad code like say an infinite loop is to run in another process even in .Net.

They also removed Appdomain in later versions of .Net because they had little benefit and weak protections compared to a a full process.

Not claiming they're as good, just noting that there are alternative ways to provide memory barriers, though obviously if it's not enforced at the language/runtime level, it requires either super strong developer disciple or the use of some other tool to do so. I can't find anything suggesting AppDomains have been removed completely though, just they're not fully supported on non-Windows platforms, which is interesting, I wonder if that means they do have OS-level support.
https://learn.microsoft.com/en-us/dotnet/api/system.appdomai...

"On .NET Core, the AppDomain implementation is limited by design and does not provide isolation, unloading, or security boundaries. For .NET Core, there is exactly one AppDomain. Isolation and unloading are provided through AssemblyLoadContext. Security boundaries should be provided by process boundaries and appropriate remoting techniques."

AppDomains pretty much only allowed you to load unload assemblies and provided little else. If you wanted to stop bad code you still used Thread.Abort which left your runtime in a potentially bad state due to no isolation between threads.

The only way to do something like an AppDomain to replace process isolation would be to re-write the whole OS in a memory safe language similar to https://en.wikipedia.org/wiki/Midori_(operating_system) / https://en.wikipedia.org/wiki/Singularity_(operating_system)

Is that saying global variables are shared between AppDomains on .NET core then? Scary if so, we have a bunch of .NET framework code we're looking at porting to .NET core in the near future, and I know it relies on AppDomain separation currently. It's not the first framework->Core conversation I've done, but I don't remember changes in AppDomain behaviour causing any issues the first time.

As it happens I already know there are bits of code currently not working "as expected" exactly because of AppDomain separation - i.e. attempting to use a shared-memory cache to improve performance and in one or two cases in an attempt to share state, and I got the impression whoever wrote that code didn't understand that there even were two AppDomains involved, and used various ugly hacks to "fall back" to alternative means of state-sharing, but in fact the fall-back is the only thing that actually ever works.

I don't know .NET enough to comment here, but I'm pretty sure that if you would manage to run bare metal C inside your .NET app (should be possible), it'll destroy all your domains easily. RAM is RAM. The only memory protection that we have is across process boundary (even that protection is not perfect with shared memory, but at least it allows to protect private memory).

At least I'm not aware of any way to protect private thread memory from other threads.

Postgres is C and that's not going to change ever.

I certainly wasn't suggesting it would make sense to rewrite Postgres to run on .NET (using any language, even managed C++, assuming anyone still uses that). Yes, it's inherent in the C/C++ language that it's able to randomly access any memory that a process has access to, and obviously on that basis OS-provided process-separation is the "best" protection you can get, just pointing out that it's not the only possibility.
.NET is a managed-language with a VM. In such language, a memory error in managed-code will often trigger a jump back to the VM, where they can attempt to recover from there.

For native code, there's no such safety net. Likewise, even for managed language, an error in the interpreter code will still crash the VM, since there's nothing to fallback to anymore.

True, if you're talking unrestricted native code, I'd essentially agree with the OP's implication that only the OS (and the CPU itself) is capable of providing that sort of memory protection. I guess I was just wondering what something like AppDomains in C might even look like (e.g. all global variables are implicitly "thread_local"), and how much could be done at compile-time using tools to prevent potentially "dangerous" memory accesses. I've never looked at the postgres source in any detail so I'm likely underestimating the difficulty of it.
For a decades old codebase probably only the OS can.

Point is it getting worse if this is changed.