Hacker News new | ask | show | jobs
by GardenLetter27 83 days ago
Reinforcement Learning changes this though - remember Move 37?

The issue is you need verifiable rewards for that (and a good environment set-up), and it's hard to get rewards that cover everything humans want (security, simplicity, performance, readability, etc.)