|
|
|
|
|
by maoeurk
1288 days ago
|
|
I’m working on a website for intermediate learners to practice by reading and listening, including Japanese, and Japanese is my strongest second language so I can answer a bit about why it’s so uncommon. Japanese is just really, really hard for computers to deal with. The only reason I got parsing and word segmentation to be pretty good was because I was so familiar with the language and wrote a 3000 line post-processing function on the tokens to get reasonable results. We have a few similar post processing steps like one to better handle separated verbs in German but it’s nothing compared to what we needed for Japanese. Additionally Japanese kind of breaks our word model, despite being aware of it and planning for it from the start and every part of the app needs special logic to support Japanese properly. It’s a lovely language, honestly my favorite language I’ve spent time with, but it’s non-trivial to handle it in general with code. Happy to answer any questions and also, self-promo: https://polyglatte.com for my project. Happy to make improvements to better support you / the intermediate reader use case, just let me know. |
|