Y
Hacker News
new
|
ask
|
show
|
jobs
by
jacobr1
725 days ago
One wrinkle, is that it is now common to fine-tune on previously derived RL datasets, with the tested inputs and preferred sample outputs as the training data.