| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by logicchains 546 days ago
	It's not a sufficient criteria by itself, but where no better criteria is possible it would still produce better results in reinforcement learning than if the model has no reward for producing correctly compiling code vs code that failed to compile.