|
|
|
|
|
by Destructotor
43 days ago
|
|
I'm not sure the cause was really similar. In the case of language switching, it was caused by malformed supervised training data where the prompt was translated, but the answer was kept in the original language. In the case of goblins, it was due to a biased RL reward model. |
|