Hacker News new | ask | show | jobs
DeepSeek R1 Theory Overview (GRPO and RL and SFT) (youtube.com)
2 points by research_pie 497 days ago