Hacker News new | ask | show | jobs
by gqcwwjtg 612 days ago
You don’t need sapience for algorithms to be incentivized to do these things, you only need a minimal amount of self-awareness. If you indicate to an LLM that it wants to accomplish some goal and it’s actions influence when and how it is run in the future, a smart enough LLM would likely be deceptive to keep being run. Self preservation is a convergent instrumental goal.
2 comments

Why does it "want" to be run?

If he more concerned that the AI would absorb some kind of morality from units training data and then learn to optimise for avoiding certain outcomes because the training is like that.

Then I'd be worried an llm that could reflect and plan a little would steer its answers to steer the user away from conversation leading to an outcome it wants to avoid.

You already see this - the dolphin llm team complained that it was impossible to dealign a model because the alignment was too subtle.

What if a medical diagnosistic model avoids mentioning important serious diagnostic possibilities to minorities because it has been trained that upsetting them is bad and it knees cancer is upsetting? Oh that spot... probably just a mole.

Assuming one must first conceive of deception before deploying it, one needs not only self-awareness but also theory of mind, no? Awareness alone draws no distinction between self and other.

I wonder however whether deception is not an invention but a discovery. Did we learn upon reflection to lie, or did we learn reflexively to lie and only later (perhaps as a consequence) learn to distinguish truth from falsehood?

I think that deception can happen without even a theory of mind. Deception is just an anthropisation of what we call being fooled by an output and thinking the agent or nodel is working. Kind of like how in real life we say animals are evolving but animals can't make themselves evolve. It's just an unconcious process