|
|
|
|
|
by spdustin
108 days ago
|
|
It’s just still so trivial to jailbreak even the latest Anthropic models (via api, and not talking about the silly ENI or Pliny breaks) I don’t understand where the safety teams are doing their work. Is it in the default chat-trained model? |
|
And going to one of the roots of the issue - the base training data - comes with its own set of unsolved challenges, not least of which is the unavoidable subjectivity of what is or isn't "safe".