| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by quantumspandex 497 days ago
	AlphaGo seems more like an automated process to me because you can start from nothing except the algorithm and the rules. Since a Go game only has 2 outcomes most of the time, and the model can play with itself, it is guaranteed to learn something during self-play. In the LLM case you have to have an already capable model to do RL. Also I feel like the problem selection part is important to make sure it's not too hard. So there's still much labor involved.

1 comments

fenomas 497 days ago

Yes, IIUC those points are correct - you need relatively capable models, and well-crafted questions. The comparison with AlphaGo is that the processes are analogous, not identical - the key point being that in both cases the model is choosing its own path towards a goal, not just imitating the path that a human labeler took.

link