|
|
|
|
|
by kortilla
542 days ago
|
|
I don’t buy it. Alignment faking has very little overlap with the motivation to something with no prompt. Look at the hackernews comments on alignment faking on how “fake” of a problem that real is. It’s just more reacting to inputs and trying to align them with previous prompts. |
|