Hacker News new | ask | show | jobs
by wahnfrieden 1404 days ago
no spaces in japanese, it's tricky to determine word boundaries and conjugations and multiple "spellings" of the same word. the libs that tokenize and de-inflect languages are usually highly specialized technologies for east asian languages (maybe others but i'm particularly familiar with japanese, korean, chinese)