Hacker News new | ask | show | jobs
by derefr 4462 days ago
Examples of length-prefixed data abound in protocols and formats defined by systems and telecom engineers (e.g. the IETF). IP packets are length-prefixed. ELF-binary tables and sections are length-prefixed. PNG chunks are length-prefixed.

It's just these worse-is-better text-based protocols like HTTP, created by application developers, that toss all the advantages of length-prefixing away. (And, even then, HTTP bodies are length-prefixed, with the Content-Length header. It's just the headers that aren't.)

1 comments

The only problem with length prefixing is that it interferes with streaming data, because you need to know the full length in advance. Thus HTTP chunked encoding. Still, it works great in most scenarios.

My favorite way to deal with this stuff is Consistent Overhead Byte Stuffing:

http://en.wikipedia.org/wiki/Consistent_Overhead_Byte_Stuffi...

In short, you take the data and encode it with a clever scheme that effectively escapes all the zero bytes. The output data contains no zeroes, but results in almost no overhead, with the worst case being an increase of 1/254 over the original size, and the best case being zero increase. (Compare to e.g. backslash escapes of quotes in quoted strings, where the worst case doubles the output size.) You then use the now-eliminated zero byte as your record separator. This lets you stream data (with a small amount of buffering to perform the encoding) while still easily locating the ends of chunks.

I've played around with COBS but never used it in a real product, so this is not entirely the voice of experience here. But it is a nifty system.

that is just freaking cool. took me about 4 times to grok it. it sort of reminds me of utf-8, and how you can synchronize that easily.