|
|
|
|
|
by masklinn
5190 days ago
|
|
> The problem is that Unicode doesn't know about language. Unicode is just characters. I won't blame you for this, it is a common mistake, but Unicode goes far beyond merely mapping characters to integers. The Standard Annexes, Technical Reports and Technical Specifications cover pretty much all things localization from line breaking [UAX14] to regular expressions [UTS18] through date and time formatting [UTS35] or sorting [UTS10]. And as it turns out, both uppercasing and titlecasing are covered by [UAX44] as part of the SpecialCasing.txt file which provides lower, upper and title-casing (along with optional conditions) for characters with non-trivial mappings (trivial 1:1 mappings are covered in the base UnicodeData.txt file) [UAX14] http://www.unicode.org/reports/tr14/ [UTS18] http://www.unicode.org/reports/tr18/ [UTS35] http://www.unicode.org/reports/tr35/ [UTS10] http://www.unicode.org/reports/tr10/ [UAX44] http://www.unicode.org/reports/tr44/ |
|