| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by visarga 1046 days ago
	The model would learn from feedback, not just regurgitate the training set, as long as the model is part of a system that can generate this feedback. AlphaGo Zero had self play for feedback. Robots can check task execution success. Even chatting with us generates feedback to the model.