| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ubernostrum 3210 days ago
	That section was written for people who know little to nothing about Unicode and the ways Unicode can be encoded to bytes. So it starts with the obvious approach -- just spit out a sequence of bytes whose integer values are the code points, which is near enough as makes no difference to how UTF-32 works -- then introduces variable-width encoding through the history of UCS-2 and UTF-16, then gets to UTF-8 and what motivated it. The advantages/disadvantages of the various encodings is something that could eat up several pieces just as long as the entire post, and for fun I'd probably throw in weird stuff like the attempt to do EBCDIC-compatible UTF instead of ASCII-compatible, etc.

1 comments

WorldMaker 3209 days ago

Someone should write up EBCDIC-based UTF as an RFC. I'm sure that there's at least one COBOL programmer out there that has been waiting for that for decades.

ETA: Mostly a joke, but it would also fit right in with things like WTF-8 (https://simonsapin.github.io/wtf-8/)

link

ubernostrum 3209 days ago

It wasn't a joke. UTF-EBCDIC is a Unicode Technical Report:

http://www.unicode.org/reports/tr16/

link

carapace 3209 days ago

aw, now i'm cranky again. lol

link