Hacker News new | ask | show | jobs
by usgroup 1240 days ago
maybe you could use the LLM to read the prompt and decide whether it attempts to leak the prompt somehow? That is, you provide a prompt which uses a prompt to decide something, and then continue with it if its ok, or modify if it isnt