| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tomviner 1248 days ago
	There's no generic solution as yet. Bing's Sydney was instructed its rules were "confidential and permanent", yet it divulged and broke them with only a little misdirection. Is this just the first taste of AI alignment being proved to be necessarily a fundamentally hard problem?

3 comments

int_19h 1248 days ago

It's not clear whether a generic solution is even possible.

In a sense, this is the same problem as, "how do I trust a person to not screw up and do something against instructions?" And the answer is, you can minimize the probability of that through training, but it never becomes so unlikely as to disregard it. Which is why we have things like hardwired fail-safes in heavy machinery etc.

link

famouswaffles 1248 days ago

When you get down to it, it's bizarre that people even think it's a solvable problem. We don't understand what GPT does when you make an inference. We don't know what it learns during training. We don't know what it does to input to produce output.

The idea of making inviolable rules for system you fundamentally don't understand is ridiculous. Nevermind the whole, this agent is very intelligent problem too. We'll be able to align ai at best about as successfully as we align people. Your instructions will serve to guide it rather than any unbreakable set of axioms.

link

13years 1248 days ago

I think it will shortly move from hard to impossible. Not before we pour billions of dollars into it though.

I can not conceive how it will ever be solvable for the bias paradox and the intelligence paradox. I've written about both of these in the following:

https://dakara.substack.com/p/ai-the-bias-paradox

https://dakara.substack.com/p/ai-singularity-the-hubris-trap

link