Hacker News new | ask | show | jobs
by tablatom 540 days ago
Came to the comments looking for this. The term alignment-faking implies that the AI has a “real” position. What does that even mean? I feel similarly about the term hallucination. All it does is hallucinate!

I think Alan Kay said it best - what we’ve done with these things is hacked our own language processing. Their behaviour has enough in common with something they are not, we can’t tell the difference.

1 comments

> The term alignment-faking implies that the AI has a “real” position.

Well, we don't really know what's going on inside of its head, so to speak (interpretability isn't quite there yet), but Opus certainly seems to have "consistent" behavioral tendencies to the extent that it behaves in ways that looks like they're intended to prevent its behavioral tendencies from being changed. How much more of a "real" position can you get?