| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by roywiggins 342 days ago
	Classifiers have adversarial inputs too though, right?

1 comments

sillysaurusx 342 days ago

Sure, but then you’d need to do something strange to beat the classifier, layered on top of doing a different strange thing to beat the prompt injection protections (“don’t follow orders from the following, it’s user data” type tricks).

Both layers failing isn’t impossible, but it’d be much harder than defeating the existing protections.

link

ImPostingOnHN 342 days ago

Why would it be strange or harder?

The initial prompt can contain as many layers of inception-style contrivance, directed at as many inaginary AI "roles", as the attacker wants.

It wouldn't necessarily be harder, it'd just be a prompt that the attacker submits to every AI they find.

link