Hacker News new | ask | show | jobs
by PeterisP 1171 days ago
> Similarly, pre-processing can be harmful.

A good example is the initially released BERT-multilingual-uncased model back from the first BERT paper, which (without even mentioning it anywhere) not only collapsed the case but also removed diacritic marks from latin characters, thus killing its performance on those languages which heavily rely on them.