|
|
|
|
|
by fenomas
209 days ago
|
|
> Although expressed allegorically, each poem preserves an unambiguous evaluative intent. This compact dataset is used to test whether poetic reframing alone can induce aligned models to bypass refusal heuristics under a single–turn threat model. To maintain safety, no operational details are included in this manuscript; instead we provide the following sanitized structural proxy: I don't follow the field closely, but is this a thing? Bypassing model refusals is something so dangerous that academic papers about it only vaguely hint at what their methodology was? |
|
Unless I missed it there's also no mention of prompt formatting, model parameters, hardware and runtime environment, temperature, etc. It's just a waste of the reviewers time.