Hacker News new | ask | show | jobs
by poizan42 3491 days ago
A lot of people have experimented with a lot of different ways of encoding binary data as printable text. Wikipedia has a list of different encoding schemes[0].

The most efficient one is yEnc[1]. Still the simplest ones such as base64 or good old hex may actually work better once compression comes into the picture.

[0]: https://en.wikipedia.org/wiki/Binary-to-text_encoding

[1]: https://en.wikipedia.org/wiki/YEnc

3 comments

It's crucial to evaluate encoding space usage in the context of compression. For instance gzip(base16(data)) is often smaller than gzip(base64(data)) for practical data. Even though base64 is more efficient than base16, it breaks up data across byte boundaries which then makes gzip significantly less efficient.
When would you gzip encoded data instead of encoding gzipped data? Doesn't gzip after encoding defeat the whole idea of encoding the data in a format that won't get mangled by systems that expect to be handling text?
When serving gzip-compressed pages to browsers that support it.
He meant that you typically use base64 when the medium you use (e.g. email) doesn't support binary data. When you compress base64 encoded data you get back binary output. If binary output is ok to transfer, then why would you use base64 in the first place? Why not just compress the raw data?
If your embedding encoded data in another file format which forces restrictions on it. The encoding in the article is very explicitly optimized to be embedded in HTML attributes, which have a limited character range. The full HTML document is later compressed for transport, over a protocol that a) is aware of the compression and b) can transport binary data.
Came here to post this... yEnc is about as compact as you can get, and has been around for over a decade.

Besides that, the Z85 encoding is the next runner up as a compact "string safe" encoding: https://rfc.zeromq.org/spec:32/Z85/

YEnc is unsuitable when the text will be transfered as UTF-8.