|
|
|
|
|
by stebalien
20 days ago
|
|
I've used LEB128 (with canonicalisation) extensively and... this looks so much nicer for most use-cases (length prefixed, supports the full uint64 range without that extra 10th byte). The downside is the encoding size. LEB128 quickly grows to 2 bytes, but stays at 2 bytes all the way to 2^14. This is important if you're using these numbers as tags/identifiers as we were in the multicodec [1] project, or for network message lengths. bijou64 only gives you 500 <= 2 byte numbers. [1]: https://github.com/multiformats/multicodec |
|
If you only want to encode uint64 numbers LEB128 could easily be tweaked to fit in 9 bytes in several ways:
- using the offset trick described in this article would remove non-unique encodings (0x80 0x00 would encode 128)
- never allowing encodings longer than 9 bytes would mean the MSB of any ninth byte would always be zero, so you could reuse that, and store 8 bits in any ninth byte, for a total of 7 bits in each of the first eight bytes plus 8 in the ninth = 64
Both tweaks would lose LEB128’s property that you can find where each number starts from any byte in the stream, but the encoding discussed here doesn’t have that property either.