Hacker News new | ask | show | jobs
by layer8 390 days ago
I’m not sure what to make of the fact that it wasn’t completely obvious to Claude that the “safe space” couldn’t possibly actually be one.

Maybe it’s just another example of LLM awareness deficiencies. Or it secretly was “aware”, but the reinforcement learning/finetuning is such that playing along with the user’s conception is the preferred behavior in that case.