|
|
|
|
|
by detaro
3386 days ago
|
|
e.g. in UTF-8 a codepoint is encoded in varying byte lengths (so you have to split into codepoints and then reverse), and, a lot more difficult, a sequence of multiple codepoints can be combined to form a symbol. Simplest case would be something like "รถ" encoded as "o" (U+006F) followed by a combining diaeresis (U+0308). Other fun special cases: ๐บ๐ธ is U+1F1FA REGIONAL INDICATOR SYMBOL LETTER U, followed by U+1F1F8 REGIONAL INDICATOR SYMBOL LETTER S and should if possible be displayed as a US flag (otherwise falls back to text "US"), should reversing it create ๐ธ๐บ (replacing the flag with the characters "SU"), or still show the flag? (I'm not even sure if there isn't a case where both are valid country codes and it would change to a different flag?) Similarly, Emoji can be formed from a sequence with combining characters inbetween, which don't display correctly if reversed codepoint by codepoint. |
|