Hacker News new | ask | show | jobs
by JimDabell 835 days ago
I’m just starting to learn Mandarin at the moment and I’ve found LLMs useful for this. I can ask the question “I’m learning Mandarin, can you break this sentence into individual hanzi and explain it?” and it will do a good job of explaining everything. Or I can say something like “I was transcribing an audio recording for an exercise and I wrote X but the answer was Y, what did I do wrong?” and it will tell me that I misheard 妈 (ma, the question particle) as 马 (mǎ, which means “horse”) etc. Or I can paste in a sentence and ask it if it’s grammatically correct, and it will explain what I did wrong and how to correct it.

For instance:

> > I’m learning Mandarin, can you break this sentence into individual hanzi and explain it? 果汁很好喝

> Certainly! Let's break down the sentence "果汁很好喝" into individual hanzi (Chinese characters) and explain what each character and the whole sentence mean.

> - 果 (guǒ) - Fruit

> - 汁 (zhī) - Juice

> - 很 (hěn) - Very; quite

> - 好 (hǎo) - Good

> - 喝 (hē) - to drink

> So, when you put these characters together:

> - 果汁 (guǒ zhī) means "fruit juice."

> - 很 (hěn) is an adverb that intensifies the adjective that follows. When used before an adjective, it indicates a high degree of something, commonly translated as "very" or "quite" in English.

> - 好喝 (hǎo hē) combines "good" (好 hǎo) and "to drink" (喝 hē), which together mean "tasty" or "good to drink."

> So, the sentence 果汁很好喝 translates to "The fruit juice is very tasty" or more directly, "Fruit juice is very good to drink." It's a simple sentence structure expressing that the speaker finds the fruit juice to be quite delicious.

It does idioms too:

> > I’m learning Mandarin, can you break this sentence into individual hanzi and explain it? 吃醋

> Sure, I can help with that. The sentence "吃醋" consists of two characters: "吃" and "醋".

> - 吃 (chī): This character means "to eat" or "to consume".

> - 醋 (cù): This character means "vinegar".

> However, when put together as "吃醋", the phrase doesn't literally mean "eat vinegar". In Chinese, "吃醋" is an idiomatic expression that means to be jealous or envious, usually in the context of a romantic relationship. It can be translated to feeling "sour" over someone's interaction with others, hence the metaphor of "eating vinegar". This phrase showcases how Mandarin often uses vivid imagery to encapsulate emotions and situations.

Even going down to a locally running Mistral 7b gives pretty good results, although it sometimes makes minor mistakes or doesn’t give as thorough an explanation as the larger models. Still very useful for my stage though, where I’m mostly just trying to remember something that isn’t quite fully memorised yet, or exploring what’s adjacent to my current knowledge.

4 comments

If you're looking for a local translation tool, I wrote a small CC-CEDICT lookup script in shell+JS then eventually rewrote it in Go so I could run it on whatever computer I was using at the time: https://github.com/alisonatwork/cedict

It just does a dumb greedy search to try find the longest chain of characters for the translation, but that turns out to work more often than not:

  $ cedict -m 果汁很好喝
  果汁    [guǒ zhī]       /fruit juice/
  很      [hěn]   /(adverb of degree)/quite/very/awfully/
  好喝    [hǎo hē]        /tasty (drinks)/
I wanted to add different strategies on -s like dict(1), but I never got round to it because it was easy enough to just force breaking the word by adding spaces:

  $ cedict -m 入鄉隨俗
  入乡随俗        [rù xiāng suí sú]       /When you enter a village, follow the local customs (idiom); do as the natives do/When in Rome, do as the Romans do/
  $ cedict -m 入鄉 隨俗
  入      [rù]    /to enter/to go into/to join/to become a member of/to confirm or agree with/abbr. for 入聲|入声[rù shēng]/
  乡      [xiāng] /country or countryside/native place/home village or town/township (PRC administrative unit)/
  随俗    [suí sú]        /according to custom/to do as local custom requires/do as the Romans do/ 
Now I live in Taiwan it might be neat to update it to output traditional characters, but to be honest I think simplified is easier to read and learn for foriegners.
I don’t mean to derail this thread, but I am considering a stint in Taiwan to learn Chinese. I am wondering how different the accent/dialect is compared to mainland China. Are you a native speaker or did you learn while there?
I am an ethnic European who lived in China for several years and now live in Taiwan. I speak and read Chinese at a level good enough for casual conversations, doing banking/rent/etc and getting by in the workplace, but I am far from an expert.

The accent and dialect is mutually intelligible on both sides of the strait, but there is a fair bit of vocabulary and some pronunciations that clearly mark you as having learned in one place or the other. Think of it as the difference between British and American English.

The bigger difference in my opinion is simplified versus traditional. It is possible to learn one and read the other, but having lived almost 2 years in Taiwan now I still find a wall of text in traditional a lot harder to digest. Personally I think simplified is more legible, especially on low res screens or at a distance, but it's unlikely to change over here for political reasons.

Taiwan is probably a fine place to learn Chinese as a beginner. It's easier to "cheat" due to the English language resources available in the community, but if you make an effort I'm sure you can immerse yourself.

I use GPT-4 for this kind of exposition too. If you haven’t come across Pleco [1] yet, I can highly recommend it. It’s a high quality Chinese-English dictionary with a lot of helpful features for reading Chinese

One feature I use a lot is the clipboard reader, which lets you step through Chinese text character by character and shows you the meaning of each character. It can be faster than using an LLM, but you don’t get the whole-sentence meaning or the ability to ask follow-up questions like you can with an LLM. But it’s a great app all round (some features are paid)

[1] https://www.pleco.com/

I wrote https://pingtype.github.io to split sentences into their words, and translate each word to pinyin and English. That might be easier if you're trying to read a longer section of text.
The question particle is 吗, not 妈. 妈妈 is mama, as in baby's first word meaning mother, shared with many other languages.
Haha, believe it or not, that was actually my error not the LLM’s. I changed the example and missed a word.