Y
Hacker News
new
|
ask
|
show
|
jobs
by
mtkd
487 days ago
Details on how DS used GRPO for RL rewards
https://medium.com/@sahin.samia/the-math-behind-deepseek-a-d...
1 comments
quantumspandex
487 days ago
Thanks!
link