Y
Hacker News
new
|
ask
|
show
|
jobs
New Anthropic research: Alignment faking in large language models
(
twitter.com
)
8 points
by
casslin
542 days ago