Hacker News new | ask | show | jobs
by wycats 4195 days ago
Han unification is one problem; another problem is that not all encodings can be round-tripped losslessly through Unicode. Shift-JIS, for example, has multiple separate characters that convert into the same character in Unicode, and therefore cannot be converted back into their original form reliably.
2 comments

The shift JIS issue seems to be a fault in the design of shift JIS, resulting in even symbols like square root not having a canonical encoding. At what point do you just draw the line and tell developers if they need to deal with such things themselves? No one is taking away byte arrays. Fragmenting the userbase seems suboptimal.