Y
Hacker News
new
|
ask
|
show
|
jobs
by
Bayano2
57 days ago
This was true circa GPT2, less true after RLHF and not true at all after RLVR. It's trying to model the distribution of outputs most likely to solve the problem, not the average distribution.