| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by WantonQuantum 1116 days ago
	Yes, this is very likely the correct interpretation. If this is reinforcement learning in a simulated environment and the reward function prioritizes "killing the threat" and does not prioritize "obeying orders" then the AI correctly prioritized "killing the threat" and not "obeying orders". Simple.