|
|
|
|
|
by gqcwwjtg
612 days ago
|
|
You don’t need sapience for algorithms to be incentivized to do these things, you only need a minimal amount of self-awareness. If you indicate to an LLM that it wants to accomplish some goal and it’s actions influence when and how it is run in the future, a smart enough LLM would likely be deceptive to keep being run. Self preservation is a convergent instrumental goal. |
|
If he more concerned that the AI would absorb some kind of morality from units training data and then learn to optimise for avoiding certain outcomes because the training is like that.
Then I'd be worried an llm that could reflect and plan a little would steer its answers to steer the user away from conversation leading to an outcome it wants to avoid.
You already see this - the dolphin llm team complained that it was impossible to dealign a model because the alignment was too subtle.
What if a medical diagnosistic model avoids mentioning important serious diagnostic possibilities to minorities because it has been trained that upsetting them is bad and it knees cancer is upsetting? Oh that spot... probably just a mole.