Hacker News new | ask | show | jobs
by woodson 793 days ago
Especially for small models I had very bad results for use in translation. Even trying all kinds of tricks didn’t help (apparently prompting in the target language helps for some). Encoder-decoder models such as FLAN-T5 or MADLAD-400 seemed far superior at equal or even smaller model size.
1 comments

I forget which model (LLaMA 3?) but I heard 95% of the training data was English.