Hacker News new | ask | show | jobs
by asabil 739 days ago
Yes, but you don’t end up with different glyphs. Arabic script has letter shaping, that means a letter can have up to 4 shapes based on its position within the word. If you chop off the last letter, the previous one which used to have a “middle” position shape suddenly changes into “terminal” position shape.
1 comments

I'm thinking even bog-standard European umlauts, cedillas, etc go multi-byte in Unicode? (Take a string of ÅÄÖåäöÜü and chop it off at various byte limits and see.)
This is just the general behavior of truncating strings by code point when they contain decomposed glyphs. This can also impact accents etc.
I don't remember the details, only that it was a bigger deal than with umlauts. I'll see if I can find the talk again.