Hacker News new | ask | show | jobs
by GeertB 2308 days ago
I really like the super high efficiency at the important multiple-of-4-byte increments. Using 7 base-24 characters to encode 32 bits is 99.7% efficient. However, I'd recommend using 7 base-24 digits followed by a blank as standard output format. This would allow for efficient 8 character <=> 32 bit conversions. Also, I think padding output to a multiple of 7 characters would be good, for similar reasons that it's good for base-64. Now you can concatenate encoded streams like you could byte streams, and recover on decode. As multiples of 32 bits are so common, padding would be used little in practice. On input, it would be fine to accept unpadded base-24 sequences, but valid base-24 output should always pad to a multiple of 7 chars (excluding the blanks that should be just for readability and not significant otherwise).

However, I strongly dislike the arbitrary mapping between character values and base-24 digits. There is a strong reason for using the order 2345679ABCEFGHKRSTWXYZ, which is that now encoded values compare the same as the original binary values. I did appreciate the 0x00000000 == ZZZZZZZ equivalence, but consistent ordering is just way more important IMO. Also 2222222 looks a lot like ZZZZZZZ. Just saying.

1 comments

I thought about the comparison bit, and I wanted to go against it.

Ordered, your snippet look like the alphabet with a few missing letters, and isn't searchable on google or anything. I really wanted the alphabet to stand out.

I don't think that it is important that it can be sorted, it is intended for randomly generated keys which by my experience, you won't be sorting.