|
|
|
|
|
by reissbaker
371 days ago
|
|
Magistral Small seems wayyy too heavy-handed with its RL to me: \boxed{Hey! How can I help you today?} They clearly rewarded the \boxed{...} formatting during their RL training, since it makes it easier to naively extract answers to math problems and thus verify them. But Magistral uses it for pretty much everything, even when it's inappropriate (in my own testing as well). It also forgets to <think> unless you use their special system prompt reminding it to. Honestly a little disappointing. It obviously benchmarks well, but it seems a little overcooked on non-benchmark usage. |
|