|
|
|
|
|
by Sesse__
179 days ago
|
|
It's why the Unicode Collation Algorithm exists. If you look in allkeys.txt (the base UCA data, used if you don't have language-specific stuff in your comparisons) for the two code points in question, you'll find: 004B ; [.2514.0020.0008] # LATIN CAPITAL LETTER K
212A ; [.2514.0020.0008] # KELVIN SIGN
The numbers in the brackets are values on level 1 (base), level 2 (typically used for accents), level 3 (typically used for case). So they are to compare identical under the UCA, in almost every case except for if you really need a tiebreaker.Compare e.g. : 1D424 ; [.2514.0020.0005] # MATHEMATICAL BOLD SMALL K
which would compare equal to those under a case-insensitive accent-sensitive collation, but _not_a case-sensitive one (case-sensitive collations are always accent-sensitive, too). |
|