Hacker News new | ask | show | jobs
by rspeer 2753 days ago
A couple reasons I'm aware of:

- Han unification. Chinese and Japanese characters that have the same etymology but are written differently are sometimes assigned to the same codepoint, and it's up to the font to distinguish. And application designers... tend not to detect the language and change the font accordingly. This leads to Japanese text being rendered in a Chinese font.

- Some Han de-unification has happened in Unicode now. Great. Which version should fonts and encodings support?

- Nobody actually knows the complete de facto mapping between Shift-JIS and Unicode. Yes, there are standards and ICU and Python modules and stuff; they're incomplete. This leads to data loss surrounding rare characters.

(Tell me you've got such a mapping and I'll give you some strings I found in the wild to decode with it.)

1 comments

>Han unification.

I am still hoping some day down the road we can fix this without overstepping on each other's culture and fonts / glyph.