Hacker News new | ask | show | jobs
by Aransentin 739 days ago
There are some hard-to-handle edge cases when doing display length truncation in Unicode, e.g. the character U+FDFD or "﷽" is four bytes but can be very long depending on the typeface*, so "completely" solving it is quite hard and has to depend on feedback from your rasterization engine.

(*Rendered version on Wikipedia: https://commons.wikimedia.org/wiki/File:Lateef_unicode_U%2BF... )

1 comments

This is a completely unrelated problem since the article is quite clearly about limiting to a certain maximum byte length and not display length. For display length you don't even need Unicode for that to depend on the font and shaping engine.