|
|
|
|
|
by huac
3496 days ago
|
|
Their training dataset is almost certainly biased towards 'formal' Chinese sources, e.g. newspapers, news broadcasts, and so on. This is probably true for every language translation dataset, but at least anecdotally I can confirm the massive disconnect between spoken and written Chinese. It's really interesting culturally, since modern written Chinese is split between Simplified (PRC) and Traditional (HK/TW/etc), because Mao thought Traditional was too difficult for the proletariat. Yet official national news sources in China are almost always given in formal Chinese, which nobody outside of the elite really speaks! |
|
Go to any USA Today or WSJ article and read a paragraph out loud; no one talks like that.