Hacker News new | ask | show | jobs
by shagie 1162 days ago
Do other languages have as nice a mapping to tokens?

For example, if you were to go from French, you'd have 33 characters to work with rather than 26 (accents such). And you'd have chemisier and chemisière being two different genders of the same word that are used in different contexts.

English tends to not have this difference.

Likewise, French has more verb conjugation forms than English does.

If you were to go to Japanese, you'd have the hiragana, katakana and kanji.

While my Anglocentrism may be showing, I'm not sure there is another language that tokenizes as well when it comes to novel character combinations.

    Make up a new word.  Use it in a setence.  Give a definition for it.

    My new word is 'diflubble'. It is the feeling one gets when they are both excited and nervous in anticipation of an upcoming event. 

    For example, I felt diflubble on the morning of my graduation ceremony.
vs:

    Make up a new word in Japanese.  Use it in a setence and give a translation for it.  Give a definition for it.

    My new Japanese word is "keigarou", which means "being full of energy".

    例えば、私は今日、keigarouな気持ちでいます。
    Translation: For example, I am feeling keigarou today.
The thing there is that you can't just make up new kanji. And it wouldn't be hiragana either.