Hacker News new | ask | show | jobs
by numpad0 19 days ago
> IVD is the best compromise we can have in this situation.

Maybe, but no one is running an ivdfy-filter through every single Japanese documents and the issue keeps going. Maybe one way to make it happen is to make the Simplified forms singularly canonical to the CJK Unified Ideographs so to classify everything in that form as Chinese, and define Japanese script as being always flagged with IVDs, though I don't know what the storage and processing implication of that might be. But my point is that maintaining the position that users can optionally choose to not display text in a wrong language and Unification issues are merely user errors don't make any sense to me.

> Korea would have been better keeping Chinese characters (Hanja) in use.

I can't speak for all, but I, for one, do regularly encounter machine translation failures in Korean contents due to homophones even with LLM-based ones in the ways that don't happen with Japanese. It manifests as either homonym errors[1] or the MTL resorting to phonetic transcripts that I have no idea about[2]. Both happens in formal writings like newspaper Web articles in addition to casual social media posts. Since it appears that there's no way this issue could happen with "our" system, it sometimes feel like reverting to that could fix it.

1: (like "plain/plane", had the source been English and this was somehow happening)

2: (like "That arm might be fukuzatukossetsushiteru" had the source been Japanese)

1 comments

(update: looks like there was someone/some groups ragebaiting Korean and Japanese Twitter users with Korean transition into the Hangul phonetic script for Twitter impression incentives money. Those tweets had not reached me at the time of writing above comment, and my opinion that bringing back Kanji/Hanzi could solve some translation/communication issues is not based on whatever they used as fuels, though I fear it might have been actually close to it)