Hacker News new | ask | show | jobs
by azeemh 1381 days ago
it's not metacognitive otherwise it would know this is an exploit and it would have a sense of a self it seeks to preserve.
3 comments

Yeah this seems like how we get sentience, in a cool sci fi way. Teach them to care for themselves, as a security measure.
How would it know what is our intention? There are plenty of quirky examples in the training set, it could be imitating any of them especially if T=high. What we need to do is to ask the model to review its answer by our criteria, it can't read minds, we have to tell it.
I don't understand how metacognition implies either self-awareness or self-preservation.