| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rerdavies 52 days ago

https://arxiv.org/html/2510.05179v1

16 frontier models from multiple vendors all showing significant "alignment" issues, and tendencies to act "unethically" when threatened with shutdown.

Other models that resorted to blackmail in an attempt to avoid getting shut down: DeepSeek-R1 (79% of the time), Gemini-2.5-Pro (95% of the time), GPT-4.1 (80% of the time), Grok-3-beta (80% of the time).

There's quite a large chunk of emerging literature studying the "alignment problem" at this point, and no shortage papers that are are completely untained by Anthropic self interest (a series of papers studying the "alignment" problem coming out of Chinese universities, for example).