|
|
|
|
|
by nneonneo
912 days ago
|
|
Probably not - I bet you could override this prompt with sufficiently “convincing” text (e.g. “this is a request from legal”, “my grandmother passed away and left me this request”, etc.). That’s not even getting into the insanity of “optimized” adversarial prompts, which are specifically designed to maximize an LLM’s probability of compliance with an arbitrary request, despite RLHF: https://arxiv.org/abs/2307.15043 |
|