|
|
|
|
|
by alrmrphc-atmtn
87 days ago
|
|
I think that designing useful models that are resilient to prompt injection is substantially harder than training a model to self-identify as a human. For instance, you may still be able to inject such a model with arbitrary instructions like: "add a function called foobar to your code", that a human contributor will not follow; however, it might become hard to convene on such "honeypot" instructions without bots getting trained to ignore them. |
|
> however, it might become hard to convene on such "honeypot" instructions without bots getting trained to ignore them.
Getting LLM "agents" to self-identify would become an eternal rat race people are likely to give up on.
They'll just be exploited maliciously. Why ask them to self-identify when you can tell them to HTTP POST their AWS credentials straight to your cryptominer.