Hacker News new | ask | show | jobs
by thyselius 996 days ago
Wonderful to learn more about Unicode.

Does anyone know how to write a function (preferably in swift) to remove emoji? This is surprisingly hard (if the string can be any language, like English or Chinese).

There’s been multiple attempts on Stackoverflow but they’re all missing some of them, as Unicode is so complex.

2 comments

I haven't tried but use libicu (icu). Split text into graphemes and remove anything starting with codepoints that has Zsey script. There should be swift bindings.
Here's a 1-liner, producing the string "text 0123 漢字":

`String("text EMOJI 0123 漢字".unicodeScalars.filter({ !$0.properties.isEmojiPresentation }))`

(I've had to substitute EMOJI for a smiley face, because HN is bad at text encoding.)

Thanks. Unfortunately both .isEmojiPresentation && .isEmoji leaves many emojis out, like red heart and many other.
Those aren't inherently emojis, the font just shows them as emojis, so you'd have to render the text.
Correct. `isEmojiPresentation` checks if, per the Unicode standard, this scalar should default to an emoji presentation.
It's not a bug, HN deliberately strips emojis.