Hacker News new | ask | show | jobs
by uasi 21 days ago
Got an incorrect result on my first try. Input was 振り仮名変換器の性能が如何程か試してみよう. It returned 如何(どう)程(ほど) instead of 如何(いか)程(ほど).

Regardless, I'm impressed with the tool!

1 comments

Thanks, this kind of report is very useful.

如何 is context-dependent, and I hadn’t come across this case yet. I’ll add it to the model soon. Really appreciate the report and the kind words.

Don't want to discourage or anything but is this mostly pattern matching? Most of the comments here seem to be about corner cases "added to the model", which doesn't feel that novel. Naturally, context-aware is all about the corner cases.

今日asこんにちはbeing denylisted was the biggest point of concern here.

That’s a fair concern. It’s a hybrid system, so dictionaries and rules handle the high-confidence cases, and the context model handles selected ambiguous words.

今日 is extremely common, but in the vast majority of ordinary text it’s read きょう. The こんにち reading is much less common and tends to appear in more formal contexts, so I handled it conservatively.

I do plan to unblock more cases like this once I find a way to speed up the context model.