Hacker News new | ask | show | jobs
by somerandomone 4066 days ago
Inventing computer with Chinese/Arabic is inherently harder because it's more difficult to type such characters. Once you pass the punctuation card stage, now what? Build a huge keyboard with 10s of thousand of Chinese characters? It took 13 years for MS-DOS to support typing in Chinese.
5 comments

The first solution that came into my mind, type using strokes.

Here is a list of 11 strokes:

http://www.121chineselessons.com/wp-content/uploads/2012/02/...

Every word's strokes has a pre-determined "correct" order.

Once you list out the word's strokes, there are only a a couple of possibilities of characters left.

A blog post on `Chinese Python`. http://reganmian.net/blog/2008/11/21/chinese-python-translat...

Unfortunately, Chinese Python maps English words to 2-character Chinese words, so the translation requires spaces between the Chinese words. So

    while running:
      guess = int(raw_input('Enter integer: '))
      if guess == number:
        print 'Congratulations'
        running = False
      elif guess < number:
        print 'No, higher'
      else:
        print 'No, lower'
translates to

    當 運行:
      猜測 = 整數(輸入('輸入數字: '))
      如果 猜測 == 數字:
        印出 '恭喜'
        運行 = 假
      假使 猜測 < 數字:
        印出 '錯了, 再大'
      否則:
        印出 '錯了, 再小'
But if we map English words in programming to single Chinese characters, we wouldn't need spaces, so perhaps

    當運:
      猜=整(輸('輸入數字: '))
      如猜==數:
        印'恭喜'
        運=假
      否如猜<數:
        印'錯了, 再大'
      否:
        印'錯了, 再小'
And if we made the syntax more C-like to fully make use of the spaceless syntax

    當運{
      猜=整(輸('輸入數字: '))
      如猜==數{印'恭喜';運=假}
      否如猜<數{印'錯了, 再大'}
      否{印'錯了, 再小'}
    }
One could program on one's smartphone screen on the train!
You can use semi-colons in Python to put multiple lines in one line, as well as inline blocks next to the colon. So, you can remove the braces, too. Added parenthesis for python 3.

    當運:
      猜=整(輸('輸入數字: '))
      如猜==數:印('恭喜');運=假;
      否如猜<數:印('錯了, 再大');
      否:印('錯了, 再小')
Hard to really guess what would have happened but it's possible that a Chinese speaker could have opted to start with one of the romanization systems or zhuyin fuhao and then build a system to inputs characters at a later point.

For Arabic the challenge is more akin to this type of problem: "create a computer that can handle a cursive form of the Latin alphabet". You could start with something really rudimentary like allow for Arabic characters in isolated form. It would be ugly and challenging to read, but a start. Wouldn't be too hard to add the initial, medial and final forms, eventually diacritics as well.

That's an English way to think about Chinese.
How do the Chinese approach this problem? Even something as simple things as looking up a word in a dictionary seems impossible in a language like Chinese where every work is a unique cahracter but obviously the people who actually deal with the language will have solutions I've not thought of.
Human ingenuity always finds a way. In Chinese dictionaries, there are simply multiple indices: you can look up characters by pronunciation (if you know the word but not how to write it) or by shape (if you know the character but not how to say it). The latter works because there is a small set of shapes (“radicals”) that are combined to produce new characters.
I can't speak directly to Chinese, but I am studying Japanese, which has a writing system derived from Chinese. Essentially each character can be broken up into smaller components (called radicals). This allows characters to be ordered somewhat 'alphabetically'. In theory, the radicals also carry some sementic and phonetic meaning, although (at least in Japanese) many characters have deviated significantly from what you would expect from the roots.

Also, although Japanese has been (through deliberate effort) simplified to about 2,000 characters needed to be considered literate, it still takes until high-school for students to be able to fully read and write all 2,000.

(Disclaimer: Not to argue for who invented computers first.)

Yeah, there is a famous computer geek who invented a super nice Chinese input method named "Cangjie" in Assembly and it can be used with a special Chinese CPU. (Now he is an old man.)

https://en.wikipedia.org/wiki/Chu_Bong-Foo

Two ways. If you know what it sounds like, you look in the phonetic index, which is ordered alphabetically according to the pronunciation; if you know what it looks like, you can look in the radical index, which lists characters by their radicals, ordered by how many strokes they have.
To answer your question on looking up a character in a dictionary, there are 3 ways, which are 3 indexing systems, all present in a dictionary. One can decide which index to use, either based on its pronunciation or its composition.
Largely, they don't, without a great deal of training. http://pinyin.info/readings/texts/moser.html
Arabic would actually be easier. It has about the same amount of letters and no capital letters.