|
|
|
|
|
by cvhc
670 days ago
|
|
There is one difference between gibberish Chinese and Latin character sequences. In Chinese, each character indeed carry some meanings (like a word). So I guess the model may hallucinate some output inspired by these meanings. In the case "慢正潤牯" -> "Slow and positive", it actually translated the first two characters literally (慢 -> slow, 正 -> correct/positive/upright). So equivalent English gibberish would be like "hast prank bibble done anut me me ions." Google translates this one to "对我而言,恶作剧已经完成了。" (To me, the prank has been done.) in Chinese -- very valid sentence, and "¿Me has hecho una broma a mí, Bibble?" in Spanish -- also seems valid. I guess the model is (over) optimized to generate valid outputs. This can be a feature, so it still translates grammatically invalid but to some degree understandable text (like with typos or non-standard Internet language). |
|