Hacker News new | ask | show | jobs
by dhosek 1608 days ago
On the challenge front, there are things like á which might be a single code point or two code points (a+´). Then there are the really challenging things like ᾷ where if the components are individual characters, the order of ͺ and ῀ are not guaranteed to be consistent.
2 comments

Which is why these APIs should always make normalization available: https://unicode.org/reports/tr15/
Then you have stuff like zalgo text (http://eeemo.net/) which takes pride in abusing code points