Hacker News new | ask | show | jobs
by FeepingCreature 339 days ago
However, it possibly was RL trained on code tasks and penalized for errors.