Hacker News new | ask | show | jobs
by CamperBob2 501 days ago
Also, well - there's the technicality of "you don't 'win' a conversation like you can 'win' at Go", so how would you know to reward the model as you're training it?

https://i.imgur.com/CBmMSqO.png, perhaps