Y
Hacker News
new
|
ask
|
show
|
jobs
by
CamperBob2
501 days ago
Also, well - there's the technicality of "you don't 'win' a conversation like you can 'win' at Go", so how would you know to reward the model as you're training it?
https://i.imgur.com/CBmMSqO.png
, perhaps