Hacker News new | ask | show | jobs
Deepseek R1 Zero learns to reason using reinforcement learning on base model [pdf] (github.com)
6 points by virde 512 days ago