|
|
|
|
|
by nialv7
311 days ago
|
|
thoughts in the field say instead of a model that is pre-trained normally then censored, this is a model pre-trained on filtered data. i.e. it have never seen anything that is unsafe, ever. you can't jailbreak when there is nothing "outside". |
|
I don't think that's true, you can't ask it outright "How do you make a molotov cocktail?" but if you start by talking about what is allowed/disallowed by policies, how examples would look for disallowed policies and eventually ask it for the "general principles" of how to make a molotov cocktail, it'll happily oblige by essentially giving you enough information to build one.
So it does know how to make an molotov cocktail, for example, but (mostly) refuses to share it.