| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by CamperBob2 501 days ago
	Also, well - there's the technicality of "you don't 'win' a conversation like you can 'win' at Go", so how would you know to reward the model as you're training it? https://i.imgur.com/CBmMSqO.png, perhaps