|
|
|
|
|
by earthboundkid
1129 days ago
|
|
Yes, I’ve also seen it struggle with ROT13. I’ve read that this is because the tokenizer breaks up words in a way that makes mapping to ROT13 hard. I don’t expect it to have much luck with a language with a small corpus. |
|
It struggles with rot13 because people don't generally make large corpuses of text rot13 available, next to their translation, so the problem compounds. On one hand there are probably not many rot-13'd words recognized by the tokenizer, and on the other hand even if there were the model wouldn't be trained to predict the correct translation after these tokens because there are very little rot13 roseta stones just laying around.