Hacker News new | ask | show | jobs
by keithwinstein 4704 days ago
Not quite. The Wikipedia article shows the difference between U+0660 .. U+0669 (Arabic-Indic digits) on the top row and U+06F0 .. U+06F9 (Eastern Arabic-Indic digits) on the bottom row.

But what I'm talking about are the different glyphs used to represent the bottom row (U+06F0 .. U+06F9) depending on whether the text is in Persian, Sindhi, or Urdu. See http://www.unicode.org/versions/Unicode6.2.0/ch08.pdf, table 8-2.

There is also the issue I mentioned about Chinese vs. Japanese glyphs for the same coded character, which is at least as important in practice.

1 comments

This is an issue with CJK characters and probably just one more reason why UTF-8 adoption has been slow where JIS is good enough.