|
|
|
|
|
by GardenLetter27
83 days ago
|
|
Reinforcement Learning changes this though - remember Move 37? The issue is you need verifiable rewards for that (and a good environment set-up), and it's hard to get rewards that cover everything humans want (security, simplicity, performance, readability, etc.) |
|