Hacker News new | ask | show | jobs
by CRConrad 738 days ago
I'm thinking even bog-standard European umlauts, cedillas, etc go multi-byte in Unicode? (Take a string of ÅÄÖåäöÜü and chop it off at various byte limits and see.)
2 comments

This is just the general behavior of truncating strings by code point when they contain decomposed glyphs. This can also impact accents etc.
I don't remember the details, only that it was a bigger deal than with umlauts. I'll see if I can find the talk again.