Hacker News new | ask | show | jobs
by batterseapower 2809 days ago
The value of large corpora for translation may be diminishing.. In particular, Facebook have achieved impressive results using unsupervised ML for translation: https://code.fb.com/ai-research/unsupervised-machine-transla...

The basic idea is to use word vector embeddings to build a source<->target dictionary, then combine this with a language recognition model to iteratively bootstrap a set of source<->target training examples for use with a conventional ML approach.

1 comments

So the value of a large corpus remains, it's just this one happens to be generated, as opposed to collected.