| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jsnathan 3217 days ago

Yes there are a number of models, but since the quality of the results depends a lot on the training data as well, which I wouldn't know how to find or evaluate, and possibly might require tweaking algorithms for different languages, which I wouldn't know how to do, it's not really 'usable' (for me).

I figured someone would have gone to the trouble of combining models with a maintained collection of datasets to produce an open source alternative to Google Translate by now. I've been wondering that for years and it never seems to happen. Not saying anyone should feel obligated - I'm just curious why we don't see this, when we see so many other open source software projects that are competetive with their commercial alternatives.

Is it difficult/expensive to acquire these datasets? Is it a lot of effort to actually fine-tune the algorithms to reach passable results?

It seems (without knowing the details myself) that the state of the art in actually usable machine translation tools is always locked up in commercial IP, even though it feels (at least to me) like something that should be a free public service and therefore an ideal candidate for the 'open source' treatment.