Hacker News new | ask | show | jobs
Grpo explained: group relative policy optimization for LLM finetuning (cgft.io)
1 points by kumama 65 days ago