| HN Mirror

Yep. And tbh you probably don't even have to do this; the R1 paper found that just running SFT the base model with a relatively small number of monolingual reasoning traces was enough for it to get the idea and iirc they didn't even bother selecting for language specifically in the RL training looop itself.