Hacker News new | ask | show | jobs
by nemothekid 4805 days ago
If I'm reading this right, with ZFS compression enabled I am seeing 1/3rd disk usage and 3x increase of speeds in query times just from switching the filesystem. Stats like that make me very skeptical. Does this mean that I can get a 3x increase in speed while cutting my disk space down by a third just by switching to ZFS? If so, why isn't everyone doing this?
3 comments

Performance gains will be dependent in part to the compressibility of the data being written. If highly compressible (text, sparse structures like database pages), then the performance gain can be significant. Binary data or that which does not compress as well, using the algorithms usable by ZFS, will not see as much benefit.
Please also keep in mind that this blog post focuses on a workload that is completely disk I/O bound.

In practice, at least part of your working set gets served from memory, and compression doesn't help with the pages that are already in memory.

The way I make sense of this is that you need fewer (slow) disk reads to get the same amount of data into RAM, so that might explain the speedup?

I agree that it sounds too good to be true though.

Your read is correct. Once CPU time spent in decompression became less than disk wait time for the same data uncompressed, the reduced IO with compression started to win — sometimes massively. As powerful as processors are these days, results like these aren't impossible, or even terribly unlikely.

Consider the analogous (if simplified) case of logfile parsing, from my production syslog environment, with full query logging enabled:

  # ls -lrt
  ...
  -rw------- 1 root root  828096521 Apr 22 04:07 postgresql-query.log-20130421.gz
  -rw------- 1 root root 8817070769 Apr 22 04:09 postgresql-query.log-20130422
  # time zgrep -c duration postgresql-query.log-20130421.gz
  19130676

  real	0m43.818s
  user	0m44.060s
  sys	0m6.874s
  # time grep -c duration postgresql-query.log-20130422
  18634420

  real	4m7.008s
  user	0m9.826s
  sys	0m3.843s
EDIT: I'm not sure why time(1) is reporting more "user" time than "real" time in the compressed case.
zgrep runs grep and gzip as two separate subprocesses, so if you have multiple CPUs then the entire job can accumulate more CPU time than wallclock time (so it's just showing you that you exploited some parallelism, with grep and gzip running simultaneously for part of the time).
I had an original IBM PC XT (used) with a 10MB full height (2x today's 5.25") MFM hard drive.. it had about 3MB of available disk space and took I swear 6+ minutes to boot.

It actually ran faster double-spaced (stacker) and had nearly 12MB of available space... didn't have any problems with programs loading, surprisingly enough.. which became more of an issue when moving onto a 486.

Yeah, when your storage is so relatively slow, the CPU can run compression, you can get impressive gains in space and performance.