Hacker News new | ask | show | jobs
by inghoff 4999 days ago
For human-friendly URIs, I would've liked to see Doug Crockford's Base32 encoding (http://www.crockford.com/wrmg/base32.html) instead of hexadecimal. Case-insensitive, but still more compact than hex.
2 comments

Another handy trick is to drop vowels, which only costs 6 characters (with y), and greatly reduces the chance that your url will include a noticeable profanity or other undesirable word.
The problem with Base32 encodings is that there are so many of them. (I say this as a fan who promoted Base32 in a number of uses, and wish the proliferation of encodings could have been avoided.)

For example, Crockford's is different from both of the variants defined in IETF RFC 4648 (http://tools.ietf.org/html/rfc4648#section-6). For comparison, the digits sets for 0-31 are:

    RFC4648-b32: ABCDEFGHIJKLMNOPQRSTUVWXYZ234567
  RFC4648-b32he: 0123456789ABCDEFGHIJKLMNOPQRSTUV
  Crockford-b32: 0123456789ABCDEFGHJKMNPQRSTVWXYZ 
There are even others. Each had locally-reasonable reasons for their variation at the time.

I used the 1st row approach, in the SHA1 identifiers at Bitzi and in the original 'magnet:' proposal, because it had also been documented in an earlier IETF RFC for other purposes, and is perhaps best for anyplace where human sight-reading/handwriting could confuse certain digits. But the 2nd-row variant is easiest for encoding/decoding, and the 2nd/3rd row variants have some sorting benefits (the encoded versions sort in the same order as the raw binary versions).