Hacker News new | ask | show | jobs
by paulddraper 3069 days ago
(1) The benchmark measured the point of discussion.

(2) Even not using asynchronity (which Go is heralded for), processes take <2ms to start and stop. Not nothing, but certainly something you could do hundreds of times a second.

    $ time seq 1000 | while read; do sleep 0; done

    real        0m1.644s
    user        0m1.065s
    sys         0m0.672s
1 comments

1. No, the point was that fork(2)+exec(3)/spawning processes is an expensive way to run code, not how long it takes for the parent to be able to do something else.

2. Your new benchmark is better. However, it is still a useless microbenchmark, as it is an unrealistic best-case scenario. Your spawn of sleep is happening within a fresh subshell started by the pipe you made. fork(2) depends on things like VMM size and open file descriptors of the parent process, and your subshell basically has nothing at all. A real application likely holds at least a few gigabytes of virtual memory (more likely tens of gigabytes—note that virtual memory isn't the same as resident memory), which will make fork(2) take much longer, split between parent and child.

I suspect you might be confusing asynchronicity with concurrency or parallelism. Go is heralded for concurrency, sometimes in the form of parallelism, but not asynchronicity. Concurrency does not have any positive effect on execution time or cost. Parallelism can reduce execution time, but does not decrease execution cost, it simply throws more hardware at the problem.

In fact, Go is a worse-than-average language to call fork(2) in, due to it running fork(2) under a global lock. This is mentioned in the linked article. The lock contention caused by fork(2) execution time as memory consumption increased was what made the process unresponsive.

However, as I also said, whether fork is too expensive depends on the use-case.

> Each Gitaly server instance was fork/exec'ing Git processes about 20 times per second

> What's really wrong here is that they're apparently spawning processes like crazy.

Sounds like it depends on the use-case, rather then blanket "two dozen processes per second is clearly absurd".

Definitely. While fork(2) is expensive, a price is useless without also knowing the budget, and how expensive it is depends on the environment.

However, the problem in the posted article was indeed that spawning Git processes 20 times a second in that specific Go application was too much, and the fix was that Go replaced fork(2) with posix_spawn(3).