Y
Hacker News
new
|
ask
|
show
|
jobs
by
CRConrad
738 days ago
I'm thinking even bog-standard European umlauts, cedillas, etc go multi-byte in Unicode? (Take a string of ÅÄÖåäöÜü and chop it off at various byte limits and see.)
2 comments
gmueckl
738 days ago
This is just the general behavior of truncating strings by code point when they contain decomposed glyphs. This can also impact accents etc.
link
panzi
738 days ago
I don't remember the details, only that it was a bigger deal than with umlauts. I'll see if I can find the talk again.
link