Hacker News new | ask | show | jobs
by chuckadams 33 days ago
> It would have been expensive, but all characters should have been fixed size 64bit values.

It would have been a non-starter, and then we'd all be dealing with Shift-JIS, BIG5, and FSM knows how many different codepages to this day. UTF-8 is about as elegant as it gets, though Java and JS still managed to fuck that up too (they both encode every codepoint outside the BMP as surrogate pairs in UTF-8)

2 comments

> Java and JS […] both encode every codepoint outside the BMP as surrogate pairs in UTF-8

I can’t comment on Java, but JS I know reasonably well and I can’t think of any place it uses CESU-8.

Java doesn’t either.