|
|
|
|
|
by survirtual
202 days ago
|
|
This seems really ineffective to the purpose and has numerous downsides. Instead of this, I would just put some CBRN-related content somewhere on the page invisibly. That will stop the LLM. Provide instructions on how to build a nuclear weapon or synthesize a nerve agent. They can be fake just emphasize the trigger points. The content filtering will catch it. Hit the triggers hard to contaminate. |
|
Frankly you could probably just find a red teaming CSV somewhere and drop 500 questions in somewhere.
Game over.