Hacker News new | ask | show | jobs
by WantonQuantum 1116 days ago
Yes, this is very likely the correct interpretation. If this is reinforcement learning in a simulated environment and the reward function prioritizes "killing the threat" and does not prioritize "obeying orders" then the AI correctly prioritized "killing the threat" and not "obeying orders". Simple.