Hacker News new | ask | show | jobs
by userbinator 3494 days ago
The use of codepoints below 32 (space, start of what's usually considered "printable") makes me a bit hesitant. A lot of systems won't preserve those characters. Base85 is a more efficient alternative to base64, and doesn't use that lower range:

https://en.wikipedia.org/wiki/Ascii85

4 comments

base 85 also has the interesting property that 4 original bytes fit in 5 encoded bytes. Depending on your processor's memory model and the cost of multiplies compared to shifts this can make it the best performer.

This was true on Vax 8200 hardware back in the day. In the same software, with Huffman decoding of JPEGs it was also fastest to create a finite state machine with an 8 bit symbol size. I suspect that is no longer true since it would kill your L1 cache and be well into your L2 cache on modern x86 machines. It is probably better to take the instruction count hit and process as bits or nibbles.

base 85 also has the interesting property that 4 original bytes fit in 5 encoded bytes.

Still useful for Javascript, as the bit shift operators work on 32 bit "registers".

I was going to bring up base 85 as well, its a better choice for a variety of reasons. A long time ago I wrote a base encoder class in Java[1] mostly so that we could write a netnews reader in Java but also because I felt UUEncoding was not robust. The challenges of using unprintable characters is a lot more of a headache than anyone pays attention to initially. Lots (and I mean quite a few here) of systems consider unprintable characters "safe" to re-purpose into random uses. One display vendor had them changing the color of future characters in the display as an example.

Stick with the characters that nearly everyone assumes could legitimately come up in a document and your chances of running afoul of some "creative genius" who decided "Hey its unprintable so no one will try to print it, but when I do print it I want this thing to happen..."

[1] http://grepcode.com/file/repository.grepcode.com/java/root/j...

I wrote a base 92 encoder for the Javascript game I'm working on:

http://www.emergencevector.com/

It's pretty easy to write the decode for the 0-91 integer in Javascript.

    if (ch == "!") {
        return 57;
    } else {
        return ch.charCodeAt(0) - 35;
    }
It doesn't give you that much usable compactness over base 64, though you can easily encode a 360 degree angle with two bits of precision lost. Also, 5 base 92 characters can fully encode 32 bits of binary data. (Of course, since base 85 can do it in 5 characters.)

I'm probably going to go to typed arrays of 32 bit values. Currently, I can encode an entire ship's data in 18 bytes, of which 4 characters is a hash id.

I've never come across this one, thanks for the tip. I like the way it doesn't use every character in the range 0x20 to 0x7f. That last one in particular (0x7f or 'DEL') has always seemed problematic to me because it's a weird exceptional case. I've always thought someone must have screwed up with that one way back in the day. Using only '!' to 'u' and hence avoiding ' ' feels right for some reason, and I also like the cute trick of getting just a little bit of compression by using 'z' to represent 32 zero bits.
Holla for ASCII85. Used it many times when some of the base64 chars are unusable. A nice thing to know about when working with old systems