|
|
|
|
|
by masklinn
817 days ago
|
|
The latter is faster in actual CPU time, however note that TFA the measurement only starts with the program, it does not start with the start of the pipeline. Because the compilation time overlaps with the pipes filling up, blocking on the pipe is mostly excluded from the measurement in the former case (by the time the program starts there’s enough data in the pipe that the program can slurp a bunch of it, especially reading it byte by byte), but included in the latter. |
|
The amount of input data is just laughably small here to result in a huge timing discrepancy.
I wonder if there’s an added element where the constant syscalls are reading on a contended mutex and that contention disappears if you delay the start of the program.