| HN Mirror

The ML papers wouldn't bother me if they included specialists of the targeted domain to address the obvious pitfall. I've analyzed the figures in the blog post and skimmed the paper and both one novelty claim ((2) A single massively multilingual model spanning 109 languages and showing cross-lingual transfer even to zeroshot cases.) and an "explanation" (Such positive language transfer across languages is only possible due to the massively multilingual nature of LaBSE) can be debunked just by looking carefully at the figures like I did in the past hour. The languages on which they test the things are also poorly selected (6 constructed languages, one duplicate and one macro-lang) which shows clear lack of attention to details and poor understanding of some basic linguistics notions. But hey it's an ML paper, it's from Google and it has BERT in the title so get attention and will be cited even if it's half-crap.