|
|
|
|
|
by scribu
449 days ago
|
|
If the base models already have the “reasoning” capability, as they claim, then it’s not surprising that they were able to get to SOTA using a relatively negligible amount of compute for RL fine-tuning. I love this sort of “anti-hype” research. We need more of it. |
|