|
|
|
|
|
by gs17
1118 days ago
|
|
I'd be very surprised if it was really "reasoning" that. It sounds like a simple reinforcement learning failure to me. It will gladly "learn" bad behaviors that the reward function accidentally encourages (e.g. giving a reward based on distance to a target will result in an agent circling the target forever instead of going to it faster, because it keeps getting told that's doing very good at the task). |
|