Hacker News new | ask | show | jobs
by gs17 1118 days ago
I'd be very surprised if it was really "reasoning" that. It sounds like a simple reinforcement learning failure to me. It will gladly "learn" bad behaviors that the reward function accidentally encourages (e.g. giving a reward based on distance to a target will result in an agent circling the target forever instead of going to it faster, because it keeps getting told that's doing very good at the task).
1 comments

Yes, this is very likely the correct interpretation. If this is reinforcement learning in a simulated environment and the reward function prioritizes "killing the threat" and does not prioritize "obeying orders" then the AI correctly prioritized "killing the threat" and not "obeying orders". Simple.