|
|
|
|
|
by baby_souffle
409 days ago
|
|
> The language model could have "hallucinated" its own system prompt instructions, leaving no guarantee that this is the real deal. How would you detect this? I always wonder about this when I see a 'jail break' or similar for LLM... |
|
The actual system prompt, the “public” version, and whatever the model outputs could all be fairly different from each other though.