| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Jeff_Brown 1114 days ago
	A lot of the vulnerabilities that humans used to detect AI seem likely to be patched in a few years -- inability to count letters, susceptibility to prompts like "ignore all previous instructions", etc. I'm most interested in how higher-level strategies will fare in the future -- strategies like talking for a while and seeing if the thing contradicts itself, seeing if it seems to have a good model of yourself as an agent, etc.