Y
Hacker News
new
|
ask
|
show
|
jobs
by
valine
498 days ago
The RL is done on problems with verifiable answers. I’m not sure how o1 slop would be at all useful in that respect.