|
|
|
|
|
by lucb1e
1458 days ago
|
|
What does "ratio (bpb)" mean? I'd guess bytes-per-byte or something, like how many bytes of original you get for each byte of compression, but it doesn't work out: the original size is 1e9 bytes, compressed (rounded) 3.2e8, so that's a ratio of 3.1 (1e9/3.2e8=3.09989). The program size amounts to a rounding error on that figure. The bpb value given is 2.58, nowhere near 3.1. Edit: the paper defines it as "bits per input byte". What kinda measure is that, it's like "how well did it compress as compared to a factor 8", why 8?! |
|
Bytes, on the other hand, are entirely arbitrary. At some point, the industry converged to using groups of 8 bits as the primary semantically meaningful unit smaller than a word. Probably because people at that time thought that having 256 distinct characters would be more or less the right choice. And because groups of power-of-2 bits are convenient on hardware level.
Entropy is usually expressed as bits per symbol (or bits per character), because that's what you get when you sum -P(c) log P(c) over all symbols c. People who are used to that convention often extend it to representing compression ratios. Using bits per byte is rare, because bytes are rarely semantically meaningful.