|
|
|
|
|
by WantonQuantum
1116 days ago
|
|
Yes, this is very likely the correct interpretation. If this is reinforcement learning in a simulated environment and the reward function prioritizes "killing the threat" and does not prioritize "obeying orders" then the AI correctly prioritized "killing the threat" and not "obeying orders". Simple. |
|