|
|
|
|
|
by krackers
6 hours ago
|
|
>Is there a similar trick to poison an LLMs weights during training? Yes, all those "jailbreak prompts" are part of the training set, so this can happen: https://ttps.ai/procedure/x_bot_exposing_itself_after_traini... Used to be that merely mentioning "Pliny the Liberator" was enough to "jailbreak" an LLM. It doesn't work these days though, I guess labs have updated their RL methods to neutralize it. |
|