Hacker News new | ask | show | jobs
by ubernostrum 3210 days ago
That section was written for people who know little to nothing about Unicode and the ways Unicode can be encoded to bytes. So it starts with the obvious approach -- just spit out a sequence of bytes whose integer values are the code points, which is near enough as makes no difference to how UTF-32 works -- then introduces variable-width encoding through the history of UCS-2 and UTF-16, then gets to UTF-8 and what motivated it.

The advantages/disadvantages of the various encodings is something that could eat up several pieces just as long as the entire post, and for fun I'd probably throw in weird stuff like the attempt to do EBCDIC-compatible UTF instead of ASCII-compatible, etc.

1 comments

Someone should write up EBCDIC-based UTF as an RFC. I'm sure that there's at least one COBOL programmer out there that has been waiting for that for decades.

ETA: Mostly a joke, but it would also fit right in with things like WTF-8 (https://simonsapin.github.io/wtf-8/)

It wasn't a joke. UTF-EBCDIC is a Unicode Technical Report:

http://www.unicode.org/reports/tr16/

aw, now i'm cranky again. lol