Hacker News new | ask | show | jobs
by HarHarVeryFunny 168 days ago
The entire history of RL-trained "reasoning models" from o1 to DeepSeek_R1 is basically just a year old!