Hacker News new | ask | show | jobs
by the_af 11 days ago
Anthropic is not a disinterested party here, and until their experiments can be replicated from an adversarial standpoint by people without a vested interest in hyping up the tech (i.e. one assuming the null hypothesis), I wouldn't consider them to be "good evidence".
1 comments

https://arxiv.org/html/2510.05179v1

16 frontier models from multiple vendors all showing significant "alignment" issues, and tendencies to act "unethically" when threatened with shutdown.

Other models that resorted to blackmail in an attempt to avoid getting shut down: DeepSeek-R1 (79% of the time), Gemini-2.5-Pro (95% of the time), GPT-4.1 (80% of the time), Grok-3-beta (80% of the time).

There's quite a large chunk of emerging literature studying the "alignment problem" at this point, and no shortage papers that are are completely untained by Anthropic self interest (a series of papers studying the "alignment" problem coming out of Chinese universities, for example).