Hacker News new | ask | show | jobs
by sshine 1480 days ago
I used to keep track of the state of machine translation some years back.

I think the way you measure the success of an automated translation is edit distance, i.e. how many manual edits you need to make to a translated text before you reach some acceptable state. I suppose it's somewhat subjective, but it is possible to construct a benchmark and allow for multiple correct results.

The best resources I knew back then were:

VISL's CG-3 self-reported a competitively low edit distance compared to Google Translate: https://visl.sdu.dk/constraint_grammar.html -- It is a convincing argument that in order to beat Google Translate, you want less fuzzy machine learning and more structural analysis. But the abstraction unfortunately requires a rather deep knowledge of any one particular language's grammar; having a PhD in computational linguistics helps.

Apertium has an open-source pipeline: https://apertium.org/ -- seems to be much more like an open-source approach with a quality similar to Google Translate (although I don't know if it's better or worse; probably slightly worse in most cases, and with a slightly lower coverage).

1 comments

The VISL translator is not CG-3 - it's GramTrans, with the commercial vendor being GrammarSoft ApS. CG-3 is merely one of the general purpose langtech tools used in the pipeline. Apertium also uses CG-3.

Both GramTrans and Apertium are rule-based. Very similar technology.

(I wrote CG-3, and work for both GrammarSoft and Apertium.)

Thanks for clarifying, Tino.