Hacker News new | ask | show | jobs
by golergka 4601 days ago
Several characters, yes. And those characters, in turn, can be presented as low and hi surrogate pairs in UTF-16.

http://apps.timwhitlock.info/emoji/tables/unicode

Look for flags and numbers. Here's German flag in ASCII: \xF0\x9F\x87\xA9\xF0\x9F\x87\xAA 8 bytes, 2 unicode symbols, 4 UTF-16 symbols.

2 comments

This is not as strange as it might look at the first glance.

A lot of ordinary characters can be represented as two (or more) Unicode code points - for instance an unaccented Latin letter and a combining accent.

Flags emoji seem more like a hack on the side of the font or text renderer. If you look at the Unicode representation it actually spells out the ISO country code. Some fonts probably define a ligature containing these two characters that looks like a flag instead of two separate Latin characters.

Representation of digits inside keycaps also makes sense to me: it's a normal digit eight (dating back to ASCII) plus a combining character that looks like a keycap.

Are these handled in the font as ligatures?
In what UI framework? When I worked on that, I decided to render them from a different texture that doesn't depend on the current font, but scales to it's size.