Hacker News new | ask | show | jobs
by mikeash 4461 days ago
The only problem with length prefixing is that it interferes with streaming data, because you need to know the full length in advance. Thus HTTP chunked encoding. Still, it works great in most scenarios.

My favorite way to deal with this stuff is Consistent Overhead Byte Stuffing:

http://en.wikipedia.org/wiki/Consistent_Overhead_Byte_Stuffi...

In short, you take the data and encode it with a clever scheme that effectively escapes all the zero bytes. The output data contains no zeroes, but results in almost no overhead, with the worst case being an increase of 1/254 over the original size, and the best case being zero increase. (Compare to e.g. backslash escapes of quotes in quoted strings, where the worst case doubles the output size.) You then use the now-eliminated zero byte as your record separator. This lets you stream data (with a small amount of buffering to perform the encoding) while still easily locating the ends of chunks.

I've played around with COBS but never used it in a real product, so this is not entirely the voice of experience here. But it is a nifty system.

1 comments

that is just freaking cool. took me about 4 times to grok it. it sort of reminds me of utf-8, and how you can synchronize that easily.