Hacker News new | ask | show | jobs
by MaskRay 678 days ago
The scheme proposed in this blog post is also called PrefixVarInt.

Signed integers can be represented with either zigzag encoding or sign extension. For the most common one-byte encoding, zigzag encoding is a worse scheme. https://maskray.me/blog/2024-03-09-a-compact-relocation-form...

My blog post is about a relocation format. I investigated a few schemes and concluded that LEB128 is the best for my use case. There are multiple reasons including super simple implementation:

    static uint64_t read_leb128(unsigned char **buf, uint64_t sleb_uleb) {
      uint64_t acc = 0, shift = 0, byte;
      do {
        byte = *(*buf)++;
        acc |= (byte - 128*(byte >= sleb_uleb)) << shift;
        shift += 7;
      } while (byte >= 128);
      return acc;
    }
    
    uint64_t read_uleb128(unsigned char **buf) { return read_leb128(buf, 128); }
    int64_t read_sleb128(unsigned char **buf) { return read_leb128(buf, 64); }
1 comments

  static int64_t read_sprefix(unsigned char **buf) {
    uint64_t x = *(uint64_t *)*buf;
    unsigned n = stdc_trailing_zeros(x) + 1;
    assert(n <= 8); /* handles values up to 2**56 - 1 */
    *buf += n;
    return (int64_t)(x << 64 - 8*n) >> 64 - 7*n;
  }
Seems pretty OK too and doesn’t force a branch per byte. (Imagine a memcpy instead of the first line if the strict-aliasing UB annoys you.) I guess it does do somewhat more work if the majority of the inputs fits in a byte.