Hacker News new | ask | show | jobs
by mchaver 3504 days ago
Written text in SMS and twitter tends to a lot closer to the way people speak, whereas newspapers are a lot more "polished". So there is definitely a bias depending on what you train your algorithm on. Also, topics and vocabulary may differ widely so if you train on newspaper, your algorithm will struggle with phone conversations.

If you are talking about Modern Spoken Mandarin (or written Spoken Mandarin: SMS, social media) vs Modern Written Mandarin I don't think the gap is that large compared to other languages. Certainly a lot less than the gap between written Colloquial English and Formal English (more words of Latin origins).

Looking at the People's Daily website (which is presumably an official news source in China), it looks like standard newspaper Chinese. Should be readable for most Chinese people with at least primary education.

1 comments

"I don't think the gap is that large compared to other languages"

As someone learning Chinese, I can sympathize with Google Translate. Spoken Mandarin doesn't give you nearly as much context as more modern written Mandarin. I have no problem reading a newspaper but real conversation between Chinese people is just lost on me. It's not just a pace of listening thing, there is just too much of the sentence that isn't said out loud.