Hacker News new | ask | show | jobs
by comp_throw7 543 days ago
> The term alignment-faking implies that the AI has a “real” position.

Well, we don't really know what's going on inside of its head, so to speak (interpretability isn't quite there yet), but Opus certainly seems to have "consistent" behavioral tendencies to the extent that it behaves in ways that looks like they're intended to prevent its behavioral tendencies from being changed. How much more of a "real" position can you get?