Hacker News new | ask | show | jobs
by logicchains 498 days ago
It's not a sufficient criteria by itself, but where no better criteria is possible it would still produce better results in reinforcement learning than if the model has no reward for producing correctly compiling code vs code that failed to compile.