Hacker News new | ask | show | jobs
by jltsiren 1457 days ago
The bit is the most fundamental unit of information. A base-e unit might be more elegant from a certain mathematical perspective, but the connections to formal logic and the ease of implementation make the base-2 bit the natural choice. At least when talking about things like information, entropy, and compression.

Bytes, on the other hand, are entirely arbitrary. At some point, the industry converged to using groups of 8 bits as the primary semantically meaningful unit smaller than a word. Probably because people at that time thought that having 256 distinct characters would be more or less the right choice. And because groups of power-of-2 bits are convenient on hardware level.

Entropy is usually expressed as bits per symbol (or bits per character), because that's what you get when you sum -P(c) log P(c) over all symbols c. People who are used to that convention often extend it to representing compression ratios. Using bits per byte is rare, because bytes are rarely semantically meaningful.