| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by eden-u4 551 days ago
	I think the issue with RL is that, in order for a model to perform well in a task, you have to make it stubborn. In the same way a student that thinks outside the scope of the task might not perform well in a graded exam, but that does not mean he/she is a bad reasoner. With RL and all training procedure you are creating a very focused and very fit to the task thinker, which might not be useful in all applications (consider an open problem, it might need an out of the box kind of thought).