|
|
|
|
|
by zahlman
616 days ago
|
|
Python's strings have uppercase, lowercase and case-folding methods that don't choke on this. They don't use UTF-16 internally (they can use UCS-2 for strings whose code points will fit in that range; while a string might store code points from the surrogate-pair range, they're never interpreted as surrogate pairs, but instead as an error encoding so that e.g. invalid UTF-8 can be round-tripped) so they're never worried about surrogate pairs, and it knows a few things about localized text casing: >>> 'ß'.upper()
'SS'
>>> 'ß'.lower()
'ß'
>>> 'ß'.casefold()
'ss'
There are a lot of really complicated tasks for Unicode strings. String casing isn't really one of them.(No, Python can't turn 'SS' back into 'ß'. But doing that requires metadata about language that a string simply doesn't represent.) |
|