Y
Hacker News
new
|
ask
|
show
|
jobs
Grpo explained: group relative policy optimization for LLM finetuning
(
cgft.io
)
1 points
by
kumama
65 days ago