|
So, it’s using a 45-character alphabet which matches the QR code alphanumeric values table, which lets the QR code encoder switch to a more efficient mode that takes less space. I just tried rendering a QR code of the 692 characters of the introductory paragraph (with lines joined appropriately), and compared it with a QR code of the same text, uppercased and with out-of-range characters `,`, `[` and `]` changed to %. This reduced an 89×89 code down to 77×77, a 24% reduction in area. If this is roughly the ratio, then Base45-encoding binary data by QR code will yield roughly 17% area savings compared with Base64. (Base45 gets 50% bloat, then 24% shrink = multiple of 1.14; Base64 gets 33% bloat = multiple of 1.33; Base45 / Base64 = 1.14 / 1.33 = 0.83.) [Edit:
edflsafoiewq’s figures on 40-L codes at https://news.ycombinator.com/item?id=27627915 come to about 23% savings, markedly more than my 17%.] I can’t help but wonder if any of the other modes could be more efficient still—numeric mode, kanji mode and byte mode. Of course, the specs for all this are ISO specs, so I can’t read them without coughing up the moolah. In case it’s not clear, I am utterly inexpert in this domain. Further thoughts: Base45 encoding two octets in three characters is pretty wasteful: 45³ ÷ 256² ≈ 1.39, which is 39% waste. (By contrast, Base64 is 100% efficient with its alphabet: 64⁴ = 256³.) This means that if you were willing to do more complex encoding and decoding, you could shrink your QR code by roughly 39% more—to about 52% of the size of the Base64, rather than 83%. Leaving such a huge gap on the table puzzles me—I’d have thought that either you’d want something simple (where Base64 is well-understood) or want to minimise your QR codes, and Base45 sits in an awkward place in the middle. For UTF-8, base-128 will be the most efficient you get. That’ll be ~14% inflation (7 bytes in 8 characters). Which… huh, that looks to be within ε of Base45’s 50% bloat and 24% shrinkage. Not sure if that’s a coincidence or not because I don’t know how alphanumeric mode versus byte mode works in QR codes. But this suggests that alphanumeric mode and Base45-but-not-wasteful would be markedly more efficient than byte mode and Base-122. Still leaves numeric and kanji modes open as possibilities. Again, I’m inexpert and don’t know how the encodings are actually done, and that’ll matter. On edflsafoiewq’s 40-L figures: Base64 gets 2214 bytes, Base45 gets 2864 bytes, optimally-efficient base-45 would get log₂₅₆ 45⁴²⁹⁶ = 2949 bytes, only around 3% more. I think I must have made a mistake somewhere with some of my numbers. |
Base64 in binary: log₂ 64 / 8 = 75.000% efficient
Base45 in alphanumeric: 4 log₂ 256 / 33 = 96.970% efficient
Optimal numeric: 3 log₂ 10 / 10 = 99.657% efficient
Optimal alphanumeric: 2 log₂ 45 / 11 = 99.851% efficient
Optimal binary (ISO 8859-1): log₂ 191 / 8 = 94.718% efficient
Optimal binary (UTF-8, single-byte subset): log₂ 128 / 8 = 87.500% efficient
Optimal binary (UTF-8, full): 1 = 128 / 2^(8α) + 1920 / 2^(16α) + 63488 / 2^(24α) + 1048576 / 2^(32α) ⇒ α = 89.706% efficient
Optimal kanji (JIS X 0208): log₂ 6879 / 13 = 98.061% efficient
The mistake in your 39% calculation is that you forgot to take logarithms before calculating the ratio.