Hacker News new | ask | show | jobs
by paulddraper 3065 days ago
I am tired of this "processes are expensive" bullcrap. (At least for Linux.)

    $ time seq 1000 | while read; do sleep 0 & done

    real        0m0.185s
    user        0m0.546s
    sys         0m0.265s
That's less than .2ms to start a process.

Processes give you operational control (CPU, memory, permissions, isolation, monitoring) that other constructs simple cannot. Decades ago when we had far slower computers, people were doing process-oriented development and forking as if it was okay (CGI, make, git).

Somehow, separate processes came to be avoided like the plague, when in reality, they are probably the smallest resource "waste" in 99% of systems.

1 comments

This is a terrible microbenchmark.

First of all, you're only benchmarking the time it takes for fork(2) to return in the parent subshell, nothing else. The new processes don't exist yet at this point, and certainly hasn't exec'd (which tends to be why you're forking).

Second, you're not measuring the cost at all. The forked children will, at some point, start executing on other CPUs, which includes finishing configuration and running exec, which takes time. The cost is the total cycles it takes before the child is executing the intended code.

Fork is damn expensive, but whether they're too expensive depends on the usecase, and the cost of expanding hardware.

Fork time scales with the virtual memory of the forking process, and you're forking from a fresh subshell that hardly has anything allocated. It's even mentioned in the linked post that their issue stemmed from this (specifically fork lock contention spiking as fork time increased).

(1) The benchmark measured the point of discussion.

(2) Even not using asynchronity (which Go is heralded for), processes take <2ms to start and stop. Not nothing, but certainly something you could do hundreds of times a second.

    $ time seq 1000 | while read; do sleep 0; done

    real        0m1.644s
    user        0m1.065s
    sys         0m0.672s
1. No, the point was that fork(2)+exec(3)/spawning processes is an expensive way to run code, not how long it takes for the parent to be able to do something else.

2. Your new benchmark is better. However, it is still a useless microbenchmark, as it is an unrealistic best-case scenario. Your spawn of sleep is happening within a fresh subshell started by the pipe you made. fork(2) depends on things like VMM size and open file descriptors of the parent process, and your subshell basically has nothing at all. A real application likely holds at least a few gigabytes of virtual memory (more likely tens of gigabytes—note that virtual memory isn't the same as resident memory), which will make fork(2) take much longer, split between parent and child.

I suspect you might be confusing asynchronicity with concurrency or parallelism. Go is heralded for concurrency, sometimes in the form of parallelism, but not asynchronicity. Concurrency does not have any positive effect on execution time or cost. Parallelism can reduce execution time, but does not decrease execution cost, it simply throws more hardware at the problem.

In fact, Go is a worse-than-average language to call fork(2) in, due to it running fork(2) under a global lock. This is mentioned in the linked article. The lock contention caused by fork(2) execution time as memory consumption increased was what made the process unresponsive.

However, as I also said, whether fork is too expensive depends on the use-case.

> Each Gitaly server instance was fork/exec'ing Git processes about 20 times per second

> What's really wrong here is that they're apparently spawning processes like crazy.

Sounds like it depends on the use-case, rather then blanket "two dozen processes per second is clearly absurd".

Definitely. While fork(2) is expensive, a price is useless without also knowing the budget, and how expensive it is depends on the environment.

However, the problem in the posted article was indeed that spawning Git processes 20 times a second in that specific Go application was too much, and the fix was that Go replaced fork(2) with posix_spawn(3).