Hacker News new | ask | show | jobs
by satbyy 3435 days ago
ZWJ and ZWNJ are also common in Indic scripts. It's basically used to control the appearance of glyphs, for example half-forms and consonant clusters (क्‍ष vs क्ष, both are kṣa). As usual, wikipedia has good examples. The Unicode Standard also contains details about these.

ZW[N]J as a standalone character or at the beginning of a word is very unusual on a day-to-day basis, so it's understandable that Twitter fails to recognize this pattern.

¹ https://en.wikipedia.org/wiki/ZWJ

² https://en.wikipedia.org/wiki/ZWNJ

1 comments

Ah ha!

    > When a ZWJ is placed between two
    > emoji characters, it can also result
    > in a new form being shown, such as
    > the family emoji, made up of two adult
    > emoji and one or two child emoji
That makes a lot of sense too, and I hadn't put sufficient work into how that's implemented -- retrospectively that makes perfect sense.
I noticed that on new Emojis on my MacBook. Some of the new emojis like ‍ are rendered as "guy behind a MacBook" on my PC but on phones without the emoji as "guy emoji" and "computer emoji".

Same for ‍️ (male version of raise hand). On phones without the Emoji, it's just "male emoji" and "female raise hand emoji".

/e: oh, HN is stripping Emojis

This made me wonder if anyone had tried combining word2vec with emojis, and then I came across this:

https://github.com/uclmr/emoji2ve

which is a dead link
Apologies, and thanks!