Reinforcement Learning with Verifiable Rewards (RLVR) to improve math and coding success rates seems like an exception.