|
|
|
|
|
by keyboard_smash
1132 days ago
|
|
Like with any language, there’s gonna be a lot of context-dependent words/phrases with multiple meanings that are hard to segment/parse/translate correctly. Things like DeepL or GTranslate take into account probabilities for segmentation and grammar (or use ICU libraries); but that’s harder to do from a context of using ligatures and basic font engine features. e.g. The classic example is 大麻煩 - is it 大|麻煩 (a big inconvenience), or is it 大麻|煩 (marijuana annoyance)? Is 粉絲 fan, or vermicelli? Is 早唞! “good night!” or “go fuck yourself!”? |
|