Hacker News new | ask | show | jobs
by DuskStar 2574 days ago
I think the (very sci-fy) fear there isn't "we gave this agent a goal of 'score xyz points' and it escaped the sandbox to increment the points counter" but instead "we gave this agent a goal of 'conquer the world' and made it think that the game it was in was the world, and then it discovered otherwise"