Y
Hacker News
new
|
ask
|
show
|
jobs
by
tjungblut
323 days ago
If you are curios, like me, how the actual reinforcement learning happens. It uses verl [1] underneath. The paper "HybridFlow: A Flexible and Efficient RLHF Framework" [2] explains it really well.
[1]
https://github.com/volcengine/verl
[2]
https://arxiv.org/abs/2409.19256v2