Hacker News new | ask | show | jobs
by ahartman00 423 days ago
>lots of human-translated passages in their corpus

Yes. I remember reading that the EU parliamentary proceedings in particular are used to train machine translation models. Unfortunately, I cant remember where I read that. I did find the dataset: https://paperswithcode.com/dataset/europarl