Y
Hacker News
new
|
ask
|
show
|
jobs
Deepseek R1 Zero learns to reason using reinforcement learning on base model [pdf]
(
github.com
)
6 points
by
virde
512 days ago