Hacker News new | ask | show | jobs
by dilatedmind 2000 days ago
my intuition is you can save cpu time when compressing before sending over the network (and certainly wall time).

a quick test copying a 24M file (with similiar compression ratios) to s3 showed a 6% decrease in cpu time when piping through gzip.

2 comments

Depends on selected compressor, but yes, you can. I've definitely observed zstd-1 to be a net savings, where compression/decompression costs were offset by pushing fewer bytes through the RPC and network layers - and this was only from observing the endpoints, not even counting CPU on intermediate proxies/firewalls/etc.

I wouldn't normally expect gzip to be a net savings (it's comparatively more expensive), but depending on compression ratio achieved and what layers you're passing the bytes through, I'd definitely believe it can be in some contexts.

Data sent to S3 is usually hashed (depending on authentication type) in addition to being transport encrypted; I imagine the majority of this cost here is the encryption of a larger payload (which many would consider indispensable, but I point this out because I do not generally assume this when I merely consider "over the network").
You assume incorrectly. SSL encryption is in the order of 1 GB/s on a recent CPU with AES instructions (anything from this decade).

Gzip is in the order of 10 MB/s with default settings, down to 1 MB/s with the strongest compression setting. It's really really slow.

GNU gzip the application is slow on the order of 10 MB/s because of how it does file IO, but the DEFLATE algorithm that gzip is based off of is much faster than 10 MB/s at the default "level 6". For example the slz implementation of DEFLATE compresses text at 1 GB/s [1]. Even the fairly common zlib implementation can compress text at close to 300 MB/s.

http://www.libslz.org/

DEFLATE at level 6 is really doing 10 MB/s, doesn't matter if you're using gzip or zlib or another library.

slz is closer to level 20 (if there was a level 20). It's fast but the compression ratio is meh. You're better of using lz4 or zstd.