| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by denysvitali 1831 days ago

> For example, here are the results of translating the English word "hello":

> Language: fr, Translation: bonjours

> Language: fr, Translation: bonsoir

> Language: fr, Translation: salutations

> Language: it, Translation: buongiorno

> Language: it, Translation: buonanotte

> Language: fr, Translation: rebonjour

> Language: it, Translation: auguri

> Language: fr, Translation: bonjour,

> Language: it, Translation: buonasera

> Language: it, Translation: chiamatemi

Is it just me or these machine translations are worse than ... Google Translate?

3 comments

beau 1831 days ago

These results are less accurate than Google Translate. But they are far faster to get, and far less expensive to generate: https://cloud.google.com/translate/pricing — our goal is here is speed. We want to search through many possibilities as quickly as possible.

The word vectors have been aligned in multiple languages. Using an approximate nearest neighbor search we are able to find the nearest vector to the input in multiple languages very quickly.

To keep the example simple, we did not try to filter the data through hand-built language dictionaries. In fact, we simply drop words in other languages that also appear in the English .vec file. Words like "ciao" appear frequently enough in otherwise English sentences that the example code drops it from Italian, and so is not shown in the results:

% curl -s "https://dl.fbaipublicfiles.com/fasttext/vectors-aligned/wiki..." | grep -n ciao 50393:ciao 0.0120 ...

One improvement would be to filter out any words that do not appear in a hand-curated dictionary instead of filtering out words that already appear in English. We decided not to show how to do this because we'd already introduced a few concepts, like aligned word vectors, approximate nearest neighbour searches, and wanted to keep the example as simple as possible.

link

toxik 1831 days ago

Google Translate is state of the art, so I’m not sure why that would be surprising. That said, is there something wrong with the translations offered?

link

shakow 1831 days ago

> Google Translate is state of the art

For French/English/German, DeepL is much better IME.

link

dataflow 1831 days ago

> That said, is there something wrong with the translations offered?

I think in French hello = "bonjour" and hi = "salut"... not sure where "bonjours" and "salutations" came from.

link

T-A 1831 days ago

The Italian "auguri" means "best wishes"; "chiamatemi" means "call me". Neither is a plausible translation of "hello". The obvious one, "ciao", is missing.

link

e17 1831 days ago

I thought Hello was invented with the telephone. Prior to that, English greetings were good morning/evening. What do Italians and French say when they pick up the phone? Allora?

link

shakow 1831 days ago

"Bonjours" doesn't exist, and "salutations" is a tad quirky, but OK in informal settings, especially when addressing many people at once.

link

tasogare 1831 days ago

No, bonjours exists (it's simply the plural form of bonjour used as a noun) but the contexts it is used are very very infrequent so it's weird to find it in that list.

link

a1369209993 1831 days ago

Compare "I said my hellos and goodbyes." in English. It's definitely a word, just so uncommon as to be largely irrelevant in most practical contexts.

link

toxik 1831 days ago

It's very clearly semantically related though? I don't understand the complaint here.

link

numpad0 1831 days ago

It seems to be a very domain specific solution, they are trying to present versions of words in customer requested domain names if already taken.

Like you type in “stargazer. com”, system sees it’s already registered, and returns a “sorry sir it’s taken” page, with similar words listed as “but maybe try these words: astronomer, observatory, telescope, shooting star...”.

So it’s not serious translation, more of an inexpensive quick dictionary search. I guess it’s okay for its intended purposes.

link

ampdepolymerase 1831 days ago

It would be better to run the vectors through an attention layer if you want sentence to sentence translation.

link