|
|
|
|
|
by imjonse
490 days ago
|
|
Is it established whether GRPO is essential for this to work as it does, or could other RLHF-class methods provide similar results? My initial (possibly mistaken) impression was that GRPO was one of ways of mitigating the lack of enormous hardware resources. |
|