Hacker News new | ask | show | jobs
by pjjpo 11 days ago
Don't want to discourage or anything but is this mostly pattern matching? Most of the comments here seem to be about corner cases "added to the model", which doesn't feel that novel. Naturally, context-aware is all about the corner cases.

今日asこんにちはbeing denylisted was the biggest point of concern here.

1 comments

That’s a fair concern. It’s a hybrid system, so dictionaries and rules handle the high-confidence cases, and the context model handles selected ambiguous words.

今日 is extremely common, but in the vast majority of ordinary text it’s read きょう. The こんにち reading is much less common and tends to appear in more formal contexts, so I handled it conservatively.

I do plan to unblock more cases like this once I find a way to speed up the context model.