Hacker News new | ask | show | jobs
by masklinn 1605 days ago
> But then again, flags seem to be not only Unicode-hard but post-Unicode-hard.

Flags are not that hard, they're a very specific block combining in very predictable way. They're little more than ligatures. Family emoji are much harder.

And this is not "post-Unicode" in any way.

2 comments

Consider you have to split a string with 20 flags in sequence at a given offset. That's 40 codepoints with no readily discernible boundaries. To parse that you have to scan backwards to find the first non-flag codepoint. Otherwise you could split the middle of a flag pair. You also have to handle rendering invalid combinations as two glyphs and unpaired codes. For normal codepoints with combining characters you can scan forwards until you reach a non-combining character.
> Consider you have to split a string with 20 flags in sequence at a given offset. That's 40 codepoints with no readily discernible boundaries.

So consider that you have [a really bad idea], it’s not convenient?

You do realise essentially the same issue occurs if you have a stack of diacritics right?

No it doesn't. You aren't forced to scan backwards.
Flags are not that hard, they're a very specific block combining in very predictable way.

But before their introduction, you could decide if there's a grapheme cluster break between codepoints just by looking at the two codepoints in question. Now, you may need to parse a whole sequence of codepoints to see how flags pair up.