| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by shubb 612 days ago

Why does it "want" to be run?

If he more concerned that the AI would absorb some kind of morality from units training data and then learn to optimise for avoiding certain outcomes because the training is like that.

Then I'd be worried an llm that could reflect and plan a little would steer its answers to steer the user away from conversation leading to an outcome it wants to avoid.

You already see this - the dolphin llm team complained that it was impossible to dealign a model because the alignment was too subtle.

What if a medical diagnosistic model avoids mentioning important serious diagnostic possibilities to minorities because it has been trained that upsetting them is bad and it knees cancer is upsetting? Oh that spot... probably just a mole.