Hacker News new | ask | show | jobs
by cluckindan 209 days ago
The obvious guardrail against this is to include defensive poetry in the system prompt.

It would likely work, because the adversarial poetry is resonating within a different latent dimension not captured by ordinary system prompts, but a poetic prompt would resonate within that same dimension.