Hacker News new | ask | show | jobs
by anderskaseorg 1825 days ago
The right way to measure the area ratio is using entropy. An optimal encoding would save at most 3% area over Base45:

Base64 in binary: log₂ 64 / 8 = 75.000% efficient

Base45 in alphanumeric: 4 log₂ 256 / 33 = 96.970% efficient

Optimal numeric: 3 log₂ 10 / 10 = 99.657% efficient

Optimal alphanumeric: 2 log₂ 45 / 11 = 99.851% efficient

Optimal binary (ISO 8859-1): log₂ 191 / 8 = 94.718% efficient

Optimal binary (UTF-8, single-byte subset): log₂ 128 / 8 = 87.500% efficient

Optimal binary (UTF-8, full): 1 = 128 / 2^(8α) + 1920 / 2^(16α) + 63488 / 2^(24α) + 1048576 / 2^(32α) ⇒ α = 89.706% efficient

Optimal kanji (JIS X 0208): log₂ 6879 / 13 = 98.061% efficient

The mistake in your 39% calculation is that you forgot to take logarithms before calculating the ratio.

1 comments

Ah hah, yes, the 39% was linear but needed to be log. Thanks for that, and all the other figures too.