Hacker News new | ask | show | jobs
by apta 2400 days ago
> (ی vs ي)

Arabic has both, but they're pronounced differently from Farsi. (ي) is a (y) sound (like seed) whereas (ی) is either an (a) sound (like bat) or an (ay) sound (like may).

1 comments

> Arabic has both

Not really. Arabic has U+0649 (Arabic Letter Alef Maksura), while Farsi has U+06CC (Arabic Letter Farsi Yeh). They look similar, even identical depending on the font, as long as they are standalone. When they are in a word though, it gets more complicated.

The important difference between U+0649 and U+06CC is how they look when they are connected to other letters. The former is always dotless. The latter is only dotless when it is not connected to another letter from the left. Here is an example:

U+0649 (Arabic): ى لى ىد لىد

U+06CC (Farsi): ی لی ید لید

It's kinda similar to how Turkish I's are not the same as English I's. English capital vs small form is different from the Turkish one, so different code points is necessary:

English: I i

Turkish (dotless): I ı

Turkish (dotted): İ i

Because Turkish uses separete letters for capital and small letters, only the different forms have their own codepoints. Because in Farsi and Arabic different forms of letters are implemented as ligatures, you need a different codepoint for each of them. You cannot reuse standalone U+0649 for U+06CC.

So to recap, Turkish has dotted İ and dotless I and they always retain their dot status. English has one I that will be written with or without a dot depending on how it is placed in the sentence.

Arabic has dotted ي and dotless ى and they always retain their dot status. Farsi has one ی that will be written with or without dots depending on how it is placed in the word.

Makes sense. Historically, all Arabic letters were dotless as you probably know. I wonder if this made it into Farsi script somehow, for this case at least.