Hacker News new | ask | show | jobs
New Anthropic research: Alignment faking in large language models (twitter.com)
8 points by casslin 542 days ago