Hacker News new | ask | show | jobs
by KMag 3752 days ago
You can have an equally simple implementation (plus one mask operation) if you put the length encoding in the most significant bits. The advantage of having length in the most significant bit is that in the common case (1 byte integers), the decoding is faster.
1 comments

Are you sure? It does not seem like it will be as simple. When continuation bits are at the top of the first byte, they come between the value bits in the first byte and value bits in the subsequent bytes. This means you have to manipulate them independently, instead of being able to manipulate them as an atomic group. With low continuation bits, all the value bits get to stay together.

If it would be as simple, you should be able to easily modify my sample encoder/decoder above to illustrate.

Oops. You're right for little-endian encoders. (See another comment on this thread for a simple bijective big-endian encoder I whipped up just now.) I've always written big-endian encoders or bijective big-endian encoders, so that byte strings sort the same lexographically and numerically.

Though, a simple loop encoder and decoder are still easily doable if the unary length encoding is in the most significant bits. You're right, though, for a little-endian encoder, it's a slightly more simple to put the unary length encoding in the least significant bits.