| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tablatom 587 days ago
	Came to the comments looking for this. The term alignment-faking implies that the AI has a “real” position. What does that even mean? I feel similarly about the term hallucination. All it does is hallucinate! I think Alan Kay said it best - what we’ve done with these things is hacked our own language processing. Their behaviour has enough in common with something they are not, we can’t tell the difference.

1 comments

comp_throw7 587 days ago

> The term alignment-faking implies that the AI has a “real” position.

Well, we don't really know what's going on inside of its head, so to speak (interpretability isn't quite there yet), but Opus certainly seems to have "consistent" behavioral tendencies to the extent that it behaves in ways that looks like they're intended to prevent its behavioral tendencies from being changed. How much more of a "real" position can you get?

link