| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ant-kinesthetic 3 hours ago
	How many of the attacks would have been successful if they were in longer horizon scenarios. If your agent wasn't responding back this is a purely one-shot prompt injection test which I think is not where the vulnerabilities usually lie. I think several slights attempts over time might be able to break even the most recent Opus level models. At some point its out of distribution and weird things start happening