| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ori_b 5075 days ago

I was curious, so, here goes; 'foo' was a file of ~1G containing lines made up of of 999 'x's and one '\n'.

    $ ls -lh foo
    -rw-r--r-- 1 ori ori 954M Sep  5 22:57 foo

    $ time cat foo | awk '{print $1}' > /dev/null

    real	0m1.631s
    user	0m1.452s
    sys 	0m0.540s

    $ time awk <foo '{print $1}' > /dev/null 

    real	0m1.541s
    user	0m1.376s
    sys 	0m0.160s

This was run from a warm cache, so that the overhead of the extra IO from a pipe would dominate.

1 comments

dustyleary 5074 days ago

Both invocations take similiar amounts of "real" time because the task is IO-bound and it takes roughly 1.5s on your machine to read the file.

But if you add up the "user" and "sys" time in the cat example, you see that it took 1.992s of actual cpu-time... Which is actually about a 30% increase in cpu-time spent.

The perf decrease wasn't visible because you have multiple cores parallelizing the extra cpu-time, but it was there.

link