| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ProAm 2024 days ago
	I thought Signal came out with their own keyboard and I was excited.

1 comments

xster 2024 days ago

Even if they did, creating a Chinese IME is a really difficult problem space. Since typing is generally done phonetically in mainland/HK/taiwan and the real words being typed are ideographic, there's a lot of inference to translate in between.

Considering how quickly the language moves to keep up with internet culture and new newsworthy names, new parlances, new memes, an IME has to do the equivalent of staying up to date with the equivalent of urbandictionary for users to be able to invoke the latest "lit" colloquialism. This is a full-time job on its own.

link

nucleardog 2024 days ago

Yep this. Even on computers, most Chinese people I know won’t use the OS’s IME and instead use third party programs.

There are entire companies that exist to solve just this problem that is basically orthogonal to Signal’s purpose and mission. While it would be great for there to be a top tier Chinese IME from someone we trust, it’s by no means an easy task like most people are probably envisioning.

link

higerordermap 2023 days ago

Serious question and not trolling: Is Chinese/Mandarin very hard to romanize? In India we have indic IMEs but romanization works pretty well and it's rare to see people using Indic IME. Especially far-south languages have fewer consonants and there are very few ambiguities on how to pronounce. Is it not possible for Chinese?

link

nucleardog 2020 days ago

Not a Chinese speaker, but I know like 10 words.

They already have some romanizations like Pinyin that are largely phonetic.

But take something like "mao". Without any accents to indicate the tone, that could be:

  * 毛 - like a dozen distinct meanings
  * 猫 - a few meanings, but mostly "cat"
  * 冒 - half a dozen meanings
  * 昴 - one meaning
  * 懋 - a few meanings
  * 帽 - a few meanings
  * 貌 - a few meanings
  * 牦 - one meaning
  * 矛 - one meaning
  * 铆 - a couple meanings
  * 锚 - one meaning
  * 贸 - one meaning
  * 茂 - two meanings

To be honest, I got tired of compiling the list at this point. There's lots more.

When someone uses a romanized keyboard to type "m", "a", "o" you've got like two dozen possible characters that becomes. If you're trying to figure out from context which one the person might be intending, you need to look at like 60 different possible meanings in context and figure out which characters are most appropriate. And that's given the previous several things they've entered have been narrowed down to one character, but likely still have several meanings.

A lot of (I'd dare say most?) Chinese people do use romanized input (the alternative I've seen is very slowly drawing out each individual character on the screen), but whether the keyboard sees you type "I have a cute... mao" and decides you want 'anchor' or 'cat' has a huge impact on day-to-day usability for people actually using it.

The written language is vastly more complicated than the spoken one as far as I can tell. A syllable that can have 60 meanings is relatively easy to figure out in context, but when written the meaning has to be made explicit. As a really basic example, "ta" is both he and she. So they just don't have masculine and feminine pronouns, right? Wrong. 他 is he, 她 is she. These are said exactly the same so even given accents to indicate tone they're romanized identically. But when written, the distinction is made.

And if you make a mistake somewhere in all of this?

Well, baba (爸爸) is dad. baba is also poop (㞎㞎).

Wo ai ni, baba. I love you, poop.

link

novok 2024 days ago

The side effects of chinese not adopting a phonographic writing system when it could, or having both like japanese does so you don’t need to use such an informal ambiguous layer when actually writing.

link

jrockway 2024 days ago

That doesn't really solve the problem. Homophones exist, and entering the phonetic representation in some native character set would still require an additional conversion step to disambiguate the homophone. (i.e. in Japanese you can type in hiragana, and many phone IMEs work like that, but you still have to convert that input to kanji; nobody wants to read your long stream of hiragana.) The input methods add value based on how frequently they suggest the correct conversion as the first candidate (based on context).

Imagine typing English by speaking. You can say "flower", but that might get written out as "flour". Some intelligence has to be implemented that picks the right one based on context, or give you the ability to correct it. That is where the complexity in east asian input methods come from.

(Yes, as English-speaking computer users we are very lucky. The exact sort of symbols that readers of English expect map 1:1 to our keyboards. Still kind of a pain on a phone, though!)

link

bgee 2024 days ago

Native Chinese speaker here.

I'm not sure I understand your point. Without keeping up with the new memes, IME still let people type them (it's simply not as easy when IME does not auto-suggest the new combination, users can manually select each Chinese character).

Regarding "an informal ambiguous layer", are you implying there is something more fundamental/low-level than the Chinese characters used in communication? If so, what is that?

link

yorwba 2024 days ago

Japanese IMEs operate basically the same way as Chinese ones.

link

novok 2024 days ago

But native OS IMEs in japan are used frequently, while in China they are not? From what I understand hiragana -> kanji conversions are formalized and thus are easy to add by dumping a dictionary, while in chinese since the phonography is informal you need to do more effort to maintain the dictionary, as everyone ends up typing in whatever they thing they think it would be in pinyin or similar AFAIK, along with all the dialectal variations.

link

yorwba 2024 days ago

The paper cited for the 68.3% figure http://web.cse.ohio-state.edu/~lin.3021/file/SEC15.pdf says that third-party IMEs are also "very popular" in Japan and Korea, though they do not cite any statistics. (Their statistics for China are from 2014. The paper was published in 2015.)

Chinese orthography is just as standardized as Japanese and hanzi/pinyin dictionaries are not harder to maintain than kanji/hiragana ones. Some people have trouble with sound distinctions in Standard Mandarin that don't exist in their speech and enter incorrect pinyin (e.g. z instead of j or zh), but that can be treated as a typo, the same as phonetic misspellings in other languages.

Support for dialectal variations is essentially nonexistent in mainstream IMEs. People who want to use varieties other than Standard Mandarin would have to use shape-based input (including methods that decompose each character into smaller parts, like Cangjie) or send a voice message. (There are projects to create IMEs for other Sinitic languages, like https://hanhngiox.net/ but almost nobody uses them.)

(Do any Japanese IMEs support non-standard dialects or even other Japonic languages?)

link

TazeTSchnitzel 2024 days ago

Japanese IMEs are even more complicated than Chinese ones, what are you talking about? (The existence of kana doesn't solve the homophone and homograph problems.)

link

spacehunt 2024 days ago

Here's a popular, commercial Japanese IME: https://en.wikipedia.org/wiki/ATOK

link