| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by masklinn 1605 days ago

> But then again, flags seem to be not only Unicode-hard but post-Unicode-hard.

Flags are not that hard, they're a very specific block combining in very predictable way. They're little more than ligatures. Family emoji are much harder.

And this is not "post-Unicode" in any way.

2 comments

kevin_thibedeau 1605 days ago

Consider you have to split a string with 20 flags in sequence at a given offset. That's 40 codepoints with no readily discernible boundaries. To parse that you have to scan backwards to find the first non-flag codepoint. Otherwise you could split the middle of a flag pair. You also have to handle rendering invalid combinations as two glyphs and unpaired codes. For normal codepoints with combining characters you can scan forwards until you reach a non-combining character.

link

masklinn 1605 days ago

> Consider you have to split a string with 20 flags in sequence at a given offset. That's 40 codepoints with no readily discernible boundaries.

So consider that you have [a really bad idea], it’s not convenient?

You do realise essentially the same issue occurs if you have a stack of diacritics right?

link

kevin_thibedeau 1605 days ago

No it doesn't. You aren't forced to scan backwards.

link

cygx 1605 days ago

Flags are not that hard, they're a very specific block combining in very predictable way.

But before their introduction, you could decide if there's a grapheme cluster break between codepoints just by looking at the two codepoints in question. Now, you may need to parse a whole sequence of codepoints to see how flags pair up.

link