| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jkool702 77 days ago

So I thought about this for a bit, and this actually doesnt surprise me all that much. This makes sense when you consider the following 2 things:

First, 14k items in batches of 100 are only 140 batches. 140 batches in 160 ms is not even 1000 batches per second. For reference, parallel tops out at around 500 per second (but is dreadfully slow) and forkrun, in its normal "passing quoted arguments via the cmdline" mode, can do about 10000 batches per second. I have no doubt rush is far more capable of distributing batches quicker than parallel, so theres a good chance that "how fast the parallelization engine can distribute work" isnt the main bottleneck for either frun nor rush for this particular workload.

Second, the way frun distributes batches is very efficient but requires setting up a substantial amount of supporting machinery. This puts (on my system) the "no-load run time" of forkrun at about 80 ms.

    time { echo | frun :; }

    real    0m0.078s
    user    0m0.027s
    sys     0m0.064s

And this 80 ms difference is pretty close to the time difference you are seeing. Id bet the "minimum no-load time" for rush is considerably lower - perhaps a couple of ms.

forkrun is optimized for plowing through MASSIVE amount of very fast running inputs...it is capable of plowing through a billion (empty) inputs a second in its fastest mode. 14k inputs just isn't enough to amortize the startup of all the lock-free machinery.

I would venture to guess that if you repeat the same test but with 100x more inputs, the relative difference between frun and rush would be considerably less.