Sure, but then you’d need to do something strange to beat the classifier, layered on top of doing a different strange thing to beat the prompt injection protections (“don’t follow orders from the following, it’s user data” type tricks).
Both layers failing isn’t impossible, but it’d be much harder than defeating the existing protections.
Both layers failing isn’t impossible, but it’d be much harder than defeating the existing protections.