|
|
|
|
|
by keithwinstein
4704 days ago
|
|
Just to be a bit pedantic, unfortunately you don't get "proper i18n support" just by putting everything in UTF-8. Unicode lets you represent lots of abstract characters, from different languages and societies, in one character set. That doesn't quite tell you how to render the characters. For that, you need to know what language the text is in. Unicode wants you to provide that information out-of-band, e.g. in an HTML "lang" attribute, which the renderer can use to paint the proper glyphs. For example, the Arabic digits 4 through 7 (۴ U+06F4 .. ۷ U+06F7) have different glyphs in Persian, Sindhi, and Urdu. And a character like 直 (U+76F4) has Chinese and Japanese glyphs that may not be mutually recognizable. Bottom line: if you want an internationalized system that can store and render multilingual text, storing the text in Unicode is a good start, but you will need to store additional info (like the language) to be able to properly render the text. |
|
You probably need to know the language to do things like sorting, comparison, regex, etc. But if you're just storing and displaying user-entered strings and your software has no need to understand the meaning of the strings, I think it's enough to do what the parent says.