Hacker News new | ask | show | jobs
by coryfklein 2555 days ago
> a huge repeated int32 stuffed with random numbers.

Is this not exactly the behavior you'd see if you used UUIDs as your keys? Asking honestly.

2 comments

If you knew your keys were uniformly distributed you would never use a varint encoding to store them because you'd stand a very high chance of encoding them into a field longer than a primitive integer. A varint can only hold 28 bits of number in the first four bytes, so your odds of getting 5-byte output is 15/16 i.e. very likely. If you really had to encode them I would use either two fixed64 fields, the experimental fixed128 type, or 'bytes' with the exactly-36-byte-long string representation of a UUID. In no case could I imagine packing a huge vector of random numbers into a protobuf int32 field.
The point is that Protobuf has variable length ints by default. That’s an optimization for many common use cases, but slower and larger for random data, including GUIDs. Use Protobuf’s fixed ints for those.