| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by perihelions 1507 days ago

That's funny, 23 MiB/s is exactly what I get for reading systemd logs (on an NVME SSD). Is it supposed to be otherwise?

    $ sudo journalctl -r | pv -a > /dev/null
    [22.8MiB/s]

2 comments

yakubin 1507 days ago

That appears to be systemd being slow.

  $ dd if=/dev/urandom of=test bs=1G count=1 iflag=fullblock
  $ gzip -k test
  $ zcat test.gz | pv -a >/dev/null
  [ 228MiB/s]

  $ sudo journalctl -r | pv -a >/dev/null
  [13.1MiB/s]

UPDATE: Gzip with more real-world data[1]:

  $ gzip -k adventures-of-huckleberry-finn.txt
  $ zcat adventures-of-huckleberry-finn.txt.gz | pv -a >/dev/null
  [ 151MiB/s]

[1]: <https://gutenberg.org/files/76/76-0.txt>

link

klauspost 1507 days ago

By "compressing" random data you are bypassing gzip, since it will just store your data as uncompressed blocks, making "decompression" a memcopy.

With real data, deflate maxes out somewhere around there either way, but that is a bit coincidental.

With modern CPUs getting increasingly smaller IPC improvements this will likely be pretty much the max decompression speed we can expect from gzip going forward.

link

yakubin 1507 days ago

It actually made the file bigger (1.1G from 1.0G) :)

I was getting the same numbers with text data I have scattered on my disk, but those were small, so I decided to generate a bigger file. But, yes, I agree a more robust benchmark would use a Mark Twain novel e.g.

link

erk__ 1507 days ago

I think the lack of speed here is more that it has to serialize the data from disk into a readable format. I assume using the `--grep=` option is faster than piping it through grep because of this

link

perihelions 1507 days ago

    --grep=

That's exactly what I needed to know! I'm glad I asked the stupid question. Thank you!

link