Y
Hacker News
new
|
ask
|
show
|
jobs
by
Y_Y
652 days ago
RLHF is one thing, but now that the training is done it has no bearing on whether or not you can show the chain of thought to the user.