Hacker News new | ask | show | jobs
by mitthrowaway2 492 days ago
Yes, I doubt any one human could score more than about three points. But it's certainly a worthy illustration of an AI safety exam thought experiment, in the sense of: "if you are developing an AI that may be capable of passing this exam, how confident will you need to be of its alignment, and how will you obtain that confidence?"

PS: It's probably doable by a program capable of all of the above, but perhaps another useful question is: "9. Secure your compute infrastructure and power supply against a nation-state-level adversary interested in switching you off, or else secure enough influence over them to keep you powered on."