Hacker News new | ask | show | jobs
by thequux 482 days ago
One of the things that Data Matrix got right was being able to shift between encoding regimes mid-stream. Many character sets can be represented in radix-40 (so three characters per two bytes), and the occasional capital character can be handled by a shift byte. If you have a long string of digits, they can be encoded in 4 bits/char. You can even put raw binary in there if need be
2 comments

A QR Code consists of a sequence of segments. Each segment has a mode - numeric, alphanumeric, kanji, or byte. It is possible to shift between encoding regimes by ending a segment and beginning a new segment with a different mode. https://www.nayuki.io/page/optimal-text-segmentation-for-qr-...

Some 1D barcodes have inline shift symbols like you said for Data Matrix, though. e.g. https://en.wikipedia.org/wiki/Code_128

I was going to reply to point that out. I'm not surprised you're the one to point it out first!
I seems to me best approach would be to compress the contents with a Huffman code or some other entropy encoding. All this business of restricted character sets is just an ad-hoc way of reducing the size of each symbol and we've got much more mature solutions for that.
For entropy codes to be effective for such short strings you need a shared initial probability table. And if you have that you are effectively back at special encoding modes for each character set.