| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pegasuscollins 3156 days ago

Pretty much every binary format will encode integers using a fixed width or varlen scheme in "base 2".

This generally is done for two major reasons: First of all, such an encoding is significantly easier and cheaper to parse than a base10+ascii (human-readable) encoding.

I encourage you to write a parser that reads a fixed, 32-bit binary number (1) and another parser that reads a JSON-formatted number string into an internal variable in a classic language like Java or C++. You will immediately see the big difference in complexity. Make sure your parser can also deal with a message that contains more than just one number, i.e. the parser should be able to tell at which byte index an encoded number begins and ends. Even if you're using a language or library where this is hidden from you (e.g. by using parseInt or std::stod) the same work still happens behind the scenes.

The other reason is that for most numbers and fixed/varint encoding schemes, the "binary" representation will be much more compact. Storing the number "1000000" in base10+ascii (human readable) takes at least eight bytes. Storing the same number in a fixed 32-bit integer encoding takes four bytes. Using a varint encoding scheme might allow you to get down to three bytes.

(1) Ignoring stuff like byte order and representation of negative numbers; this is usually fixed in the protocol/format specification.