Hacker News new | ask | show | jobs
by perihelions 1507 days ago
That's funny, 23 MiB/s is exactly what I get for reading systemd logs (on an NVME SSD). Is it supposed to be otherwise?

    $ sudo journalctl -r | pv -a > /dev/null
    [22.8MiB/s]
2 comments

That appears to be systemd being slow.

  $ dd if=/dev/urandom of=test bs=1G count=1 iflag=fullblock
  $ gzip -k test
  $ zcat test.gz | pv -a >/dev/null
  [ 228MiB/s]

  $ sudo journalctl -r | pv -a >/dev/null
  [13.1MiB/s]
UPDATE: Gzip with more real-world data[1]:

  $ gzip -k adventures-of-huckleberry-finn.txt
  $ zcat adventures-of-huckleberry-finn.txt.gz | pv -a >/dev/null
  [ 151MiB/s]
[1]: <https://gutenberg.org/files/76/76-0.txt>
By "compressing" random data you are bypassing gzip, since it will just store your data as uncompressed blocks, making "decompression" a memcopy.

With real data, deflate maxes out somewhere around there either way, but that is a bit coincidental.

With modern CPUs getting increasingly smaller IPC improvements this will likely be pretty much the max decompression speed we can expect from gzip going forward.

It actually made the file bigger (1.1G from 1.0G) :)

I was getting the same numbers with text data I have scattered on my disk, but those were small, so I decided to generate a bigger file. But, yes, I agree a more robust benchmark would use a Mark Twain novel e.g.

I think the lack of speed here is more that it has to serialize the data from disk into a readable format. I assume using the `--grep=` option is faster than piping it through grep because of this

    --grep=
That's exactly what I needed to know! I'm glad I asked the stupid question. Thank you!