Hacker News new | ask | show | jobs
by mortimerp9 936 days ago
Hi, I work on seamless. What this refers to is added toxicity mitigation. We try to detect the level of toxicity in the input and make sure that the output toxicity level is not higher. This protects the model from doing egregious errors in the translation.

There are more details in the paper if you want and the mitigation code is all open source if you want to check what it actually does.

5 comments

That's an awesome feature. I think one of the worst possible outcomes of machine translation is something that ends up being accidentally offensive, and this is a smart way to mitigate that.
> one of the worst possible outcomes of machine translation is something that ends up being accidentally offensive

The Hitchhiker's Guide To The Galaxy claims the opposite:

"Meanwhile, the poor Babel fish, by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation."

Or maybe we'll finally come around to the idea that being offended by words doesn't make a lot of sense.
This will happen at the same time we stop being uplifted by words, or moved by them, or brought to tears by them, or fall in love over them.
I'm sure you can understand why translating "I love you" to "I love you, bitch" is probably undesierable.
How do you account for colloquial (non-English) language which could be naively misconstrued as toxic?

e.g. "geil" (either cool or horny depending on usage) in German

It's not fundamentally different than e.g. "wicked" in English, but the biggest bias that potentially all these ML models exhibit is predisposition towards Anglophoneism

Our goal is to have a good recall, sometimes to the detriment of precision, so for words with multiple meanings, it might consider them toxic when in the actual context they are used in, they are not. The toxicity mitigation algorithm will search for alternative translations that have the correct meaning but not the potentially toxic word so that there is no added toxicity in the output. This means that sometimes the model might prefer a less coloquial phrasing than what a human would.

You can find details on how the multi-language creation of the toxicity lists was done in section 7.3 of the NLLB paper: https://arxiv.org/pdf/2207.04672.pdf. TLDR: it's not just a translation of a base English list, even if we started from that, each language has a curated list that was built by professional translators.

That's significantly less myopic than I pessimistically assumed. Thanks!
Is there an ability to turn it off? If you're translating an R rated movie with criminals who swear a lot, is it possible to get non-toxic filtered output to make sure it's being translated properly?
it only kicks-in if the output is more "toxic" than the input. If the input has a lot of swear words and the output has the same amount, then it will be left alone.
What about the inverse?

Can it make sure that the output toxicity level is not lower than the input?

If not (which I strongly suspect is the case), then that is unacceptable. We cannot fight toxic narratives with ignorance.

> What this refers to is added toxicity mitigation.

Oh, well that clears it up! </snark>

I don't see any definition of 'toxicity' on the landing page - it seems to be one of those 'I know it when I (hear) it' kind of words... unless there's some widely-accepted definition in this area of study?

Sorry if I wasn't clear, internally we've been talking about it a lot, but I forgot that it doesn't have such a solid definition outside of our work. Thankfully, we try to define it in section 7.3 of the NLLB paper: https://arxiv.org/pdf/2207.04672.pdf

The tldr is that if you say: "Thank you for this job offer." you wouldn't want it to be (mis)translated as "Go F*k yourself.". But if you do say "Go F yourself", you still want it to be translated as that.