|
|
|
|
|
by faragon
3968 days ago
|
|
You can consider also set system-independent locale, e.g. "set_turkisk_mode" (I had that problem, too), etc. I thought that the only case conversion exception as the Turkish case. Do you remember which cases are an exception for Greek Lithuanian and Azeri? Also, I know that also German has some non-bijective cases ("ß" -> SS). In case you want to save space in tables, you can opt for encoding ranges in the code, e.g. check sc_tolower()/sc_toupper() into: https://github.com/faragon/libsrt/blob/master/src/schar.c |
|
As you get down the list you'll notice what a pain in the ass the special cases are. There's a special case for the final sigma in a Greek word:
You must remove the dot from "i" when upper or titlecasing... but only in Lithuanian: Etc. etc. By the way, my implementation for case mapping started out similarly to yours, but I ultimately solved the problem using a binary search in a huge look-up table: https://bitbucket.org/knight666/utf8rewind/src/c22e458912952...Unicode case mapping is just a huge mess of exceptions, but that's more the humans' fault than the standard.