Hacker News new | ask | show | jobs
by robocat 1286 days ago
Good discussion both pro and against the paper here: https://lwn.net/Articles/785430/

Fork causes huge complications. I summarised some of the paper here https://news.ycombinator.com/item?id=31702952

Edit: I imagine forking and signal handlers don’t compose well, and I also would hate to have to think how forking and SCM_RIGHTS interfere with each other: https://googleprojectzero.blogspot.com/2022/08/the-quantum-s...

1 comments

Fork is actually very fast. Too funny a paper coming from Microsoft about fork/exec speed -

https://www.bitsnbites.eu/benchmarking-os-primitives/

Linux absolutely destroys the "proper" API. Nearly 40x faster at launching a program with fork+exec than Windows' CreateProcess. Not to mention the fact that vfork has always been available which is even faster.

Fork is also pretty scalable, it requires no global locks. It is thread-safe, it has defined semantics in threaded programs and can be used to exec a process. And it isn't insecure, it does what is advertised, as securely as advertised.

And close on exec is hardly a huge complication, it's actually a detail of exec(), not fork. It applies independently of exec, and you could make an exec that closes fds by default unless they're marked with a persist-on-exec flag. Library or runtime code can do this anyway really without any "huge complication". I don't know what you mean about SCM_RIGHTS interfering with fork, do you have something in mind? The problem would really be at the exec boundary, fork does not purport to alter any security attributes of the child or parent, so it really doesn't make sense to call it insecure. It doesn't suddenly get new rights, or have any limits enforced.

I mean it is complicated stuff, but so is any process runtime environment that provides async notifications, threads, spawning, etc. Anybody who tells you they can make this simple and broadly usable is selling you snakeoil or a toy API. If people can't cope with reading documentation and thinking carefully about this stuff, they shouldn't use it anyway, they should use a higher level runtime or library to do process management. The handwringing about fork is a bit baffling. Reminds me of the handwringing about fsync, it seems that people just don't read documentation and make silly assumptions about how things should work, and then get embarrassed and blame the tools.

I mean fork retains file descriptors from the parent process. This is not some obscure undocumented behavior, it's like the second thing you read in the manual page. Same as execve. I don't like to make excuses for badly designed APIs and code, but honestly if a programmer isn't capable of thinking about what happens to file descriptors there they certainly should not be writing code that uses fork or exec, let alone something that's security sensitive. I don't think that's being unreasonable or elitist. You wouldn't want them writing security sensitive Windows code either, would you?

If you'd read the Microsoft Research PDF linked above, you'd have seen that fork scales with how much memory the parent is using, which would be invisible in these synthetic benchmarks. They say that Chrome on Linux might take up to 100ms to fork. That doesn't scream very fast to me.
> If you'd read the Microsoft Research PDF linked above, you'd have seen that fork scales with how much memory the parent is using, which would be invisible in these synthetic benchmarks.

Ah right, I was talking about SMP scaling, but yes fork does have an O(memory) scaling factor.

It certainly shows up on benchmarks because 99% of forking in Unix is on small processes (make, bash, etc.), not huge ones like Chrome. This kind of thing is why the Windows kernel is unable to compete with Linux in performance and scalability in in a lot of important basic operations that make things like git slow. The focus was on some academically supposedly "correct" interface or way of doing things, not what programs actually want to use.

> They say that Chrome on Linux might take up to 100ms to fork. That doesn't scream very fast to me.

Yeah probably true. If I needed a facility to be able to exec very frequently in a highly threaded application with a huge memory footprint and significant security concerns, I would almost certainly use a dedicated process to do that. You can potentially use posix spawn or clone directly, but forking from a thread from the main process in this case just seems unnecessary and asking for portability problems.

I don't say fork is perfect or can't be improved, but the hysteria about it's "infecting" the whole system and causing some vast problem is just not at all true.

> You can potentially use posix spawn or clone directly

You cannot use clone directly if you link to any libc.

You can if you are careful what you use libc for, of course you can't use most of its process management, threading, or make any reentrancy or thread safety assumptions.

I don't mean you would be likely use it if you were writing a normal application in C, but a special case like creating your own runtime where existing interfaces don't do exactly what you like. You wouldn't be using those things in libc anyway in that case.

I would paraphrase your answer as: speed trumps complexity, and any programmer that gets caught out by the complexity is just a bad programmer.

Where speed is critical, other techniques are used to avoid processes. Often when using a separate process, the reason is for security and correctness. I do agree that performance does matter, for example in shell scripts.

Either way, your answer does not address the issues listed in the paper. Yeah, the paper probably has a Microsoft bias, but that doesn’t mean the identified issues should just be hand-waved away for performance reasons.

> I would paraphrase your answer as: speed trumps complexity, and any programmer that gets caught out by the complexity is just a bad programmer.

That's a strawmn. A complete mischaracterization of what I wrote.

> Either way, your answer does not address the issues listed in the paper.

The "issues" are just wrong, as I pointed out. Like the laughable "infects the entire system" comment using as their example an implementation of fork in which major layers of the system were entirely unaware of fork. Contradicting themselves with their own example. It's not a paper, it's a rant, and it doesn't somehow gain gravitas or value just by being laid out in a particular way.