|
|
|
|
|
by Jeff_Brown
1114 days ago
|
|
A lot of the vulnerabilities that humans used to detect AI seem likely to be patched in a few years -- inability to count letters, susceptibility to prompts like "ignore all previous instructions", etc. I'm most interested in how higher-level strategies will fare in the future -- strategies like talking for a while and seeing if the thing contradicts itself, seeing if it seems to have a good model of yourself as an agent, etc. |
|