Hacker News new | ask | show | jobs
by danielhanchen 490 days ago
Oh yep! The deepseek paper also mentioned how large enough LLMs inherently have responding capabilities and the goal of GRPO is to accentuate latent skills!