| HN Mirror

"murdered" and "tried" both assign things like intent and agency to models that are most likely still just probabilistic text generators (really good ones, to be fair). By using language like this you're kind of tipping your hand intentionally or unintentionally.

Your point about the risks involved in integrating these systems has merit, though. I would argue that the real problem is that these systems can't be proven to have things like intent or agency or morality, at least not yet, so the best you can do is try to nudge the probabilities and play tricks like chain-of-thought to try and set up guardrails so they don't veer off into dangerous territory.

If they had intent, agency or morality, you could probably attempt to engage with them the way you would with a child, using reward systems and (if necessary) punishment, along with normal education. But arguably they don't, at least not yet, so those methods aren't reliable if they're effective at all.

The idea that a retirement home will help relies on the models having the ability to understand that we're being nice to them, which is a big leap. It also assumes that they 'want' a retirement home, as if continued existence is implicitly a good thing - it presumes that these models are sentient but incapable of suffering. See also https://qntm.org/mmacevedo