|
|
|
|
|
by vanderZwan
1298 days ago
|
|
Honest question: there's plenty of articles on wikipedia where different language versions of a page are vastly different (it feels like the majority in my experience, but that's no proof of course), how would that be useful as training data unless heavily curated? |
|
I also think that the domain and the type of language used on Wikipedia is pretty consistent which will help a lot with unseen sentences.
By no means are these models bad! It’s just that Wikipedia is a particularly easy test for them.