Hacker News new | ask | show | jobs
by Jeff_Brown 1114 days ago
A lot of the vulnerabilities that humans used to detect AI seem likely to be patched in a few years -- inability to count letters, susceptibility to prompts like "ignore all previous instructions", etc.

I'm most interested in how higher-level strategies will fare in the future -- strategies like talking for a while and seeing if the thing contradicts itself, seeing if it seems to have a good model of yourself as an agent, etc.