|
|
|
|
|
by danking00
749 days ago
|
|
Thanks for the reply and clarification! That’s an impressive throughput. Am I interpreting the behavior correctly to think that: your code has one branch per int and that branch will be predicted well when your integers are all of similar magnitude? I never fully groked the Zipf distribution but am I correct to conclude the worst case for vu128 would be alternating integers on either side of the branch condition? Thanks for the link to the slides, also clever! |
|
In terms of bytes/sec, the worst case is a set of integers exclusively in the range [0xF0, 0xFF] -- big enough to skip the fast path, small enough that each byte has to go through the full bit-masking.
In terms of values/sec, the worst is really big integers because writing 8 bytes (or 16, for `encode_u128`) to a buffer takes longer than writing one byte.
However, regardless of measurement, the worst case for vu128 is faster than VLQ/LEB128 -- except for distributions containing only [0x00, 0x7F] in which case they're all equivalent.
---
Also, after some comments on Lobsters about cases where space efficiency in the u16 range really matters, I'm experimenting with a version that tries to match or exceed VLQ/LEB128 for size efficiency at all bit lengths.
It has three more branches, but in practice this hasn't turned out to matter much -- it seems that indefinite loops are slower than a fixed set of branches, so still significantly faster than VLQ/LEB128 on most sets of integers.
It seems to be more fussy about compiler optimizations, though: https://github.com/rust-lang/rust/issues/125543