Hacker News new | ask | show | jobs
by mtkd 487 days ago
Details on how DS used GRPO for RL rewards

https://medium.com/@sahin.samia/the-math-behind-deepseek-a-d...

1 comments

Thanks!