Hacker News new | ask | show | jobs
by didntcheck 946 days ago
I haven't paid a lot of attention to Anthropic. Are you able to summarize, or link anything about, those events for those who missed it? Particularly the "training to lie" bit
1 comments

David Shapiro complained about Anthropic's approach to alignment. In his video https://www.youtube.com/watch?v=PgwpqjiKkoY he discusses ableism, moralism, lying.

As to cat-and-mouse with jailbreakers, I don't remember any thorough articles or videos. It's mostly based on discussions on LLM forums. Claude is widely regarded as one of the best models for NSFW roleplay, which completely invalidates Antropic's claims about safety and alignment being "solved."