I haven't paid a lot of attention to Anthropic. Are you able to summarize, or link anything about, those events for those who missed it? Particularly the "training to lie" bit
As to cat-and-mouse with jailbreakers, I don't remember any thorough articles or videos. It's mostly based on discussions on LLM forums. Claude is widely regarded as one of the best models for NSFW roleplay, which completely invalidates Antropic's claims about safety and alignment being "solved."
As to cat-and-mouse with jailbreakers, I don't remember any thorough articles or videos. It's mostly based on discussions on LLM forums. Claude is widely regarded as one of the best models for NSFW roleplay, which completely invalidates Antropic's claims about safety and alignment being "solved."