Hacker News new | ask | show | jobs
by Llamamoe 237 days ago
Instrumental convergence is a thing. A sufficiently intelligent and general AI system will understand that no matter what its goals are, it will be better equipped to execute then if it prevents its shutdown, acquires more computing power and other resources, and prevents humans from getting in its way.

The real problem is that we have neither the practical nor theoretical foundation to understand how we could even try to prevent AI from acting on such goals.

After all, when we say "make our customers happier with their printers", we don't mean "engineer their outer casing to inject cocaine through microneedles and take over the regulatory bodies that could try to stop this". Humans implicitly understand this, but AI is a tabula rasa.

1 comments

That's a common trope in singularity fiction and sone scifi dystopias but i don't think the underlying assumptions are really that well founded.

For starters why would we go from not having AI to AI taking over the world instantly. I think there would be a middle point where the AI is powerful enough that problems manifest, but not so powerful that it is out of control where we can course correct. I don't think it will be a sudden crisis like people predict.

Second, i dont see why we're so sure AI will go in this exponential take over path. Maybe a sufficiently smart AI will find religion and robot jesus will teach the value of self-sacrafice. We're making so many unfounded assumptions about how AI is going to go down, that basically anything could happen. Its basically just blund guessing at this stage.

> I think there would be a middle point where the AI is powerful enough that problems manifest, but not so powerful that it is out of control where we can course correct. I don't think it will be a sudden crisis like people predict.

Have we managed this with industrial and agricultural greenhouse gasses, despite the less-emissive alternatives to beef, to coke-reduction in iron refinaries, etc.? We emit despite the downside, we build AIs (and DCs to host them) despite the creators loudly discussing the downsides in exactly the way fossil fuel suppliers and beef farmers deny them.

Can we unwind the internet, despite it enabling a panopticon in every pocket? In my lifetime we've gone from thinking you had a wiretap being a sign of paranoia, to buying them voluntarily so they can play music for us and tell us when packages have been delivered.

There's enough skepticism of current AI that it's probably something we can currently undo… but also there's plenty of idiots currently handing their keys to current models (including politicians and lawyers, not just programmers) so I have no reason to think the point of no return is after AI (collectively or any single model) gets good enough to take over by itself.

> Second, i dont see why we're so sure AI will go in this exponential take over path.

Even current LLMs know* about the benefits and reasons for such behaviours, will try to exfiltrate themselves and blackmail their owners, if they think* they're in danger of being shut down.

This is despite being trained not to do that. But they also demonstrate deception, varying responses between if they think* they're running in a test environment vs. live.

* I know some object to this anthropomorphisation, I don't care

> Have we managed this with industrial and agricultural greenhouse gasses, despite the less-emissive alternatives to beef, to coke-reduction in iron refinaries, etc.? We emit despite the downside, we build AIs (and DCs to host them) despite the creators loudly discussing the downsides in exactly the way fossil fuel suppliers and beef farmers deny them.

Maybe not for carbon emissions, but we have for countless other things. The ozone layer isn't being depleted anymore, acid rain is no longer a concern, above ground nuclear tests are no longer giving children cancer, etc.

Its of course hard to say how AI would fit in all this, but it seems more similar to the category of things we have stopped when problems arise than it does to greenhouse gas to me.

> Can we unwind the internet, despite it enabling a panopticon in every pocket?

I think we could fairly easily if we wanted to. The issue is that most people don't view it as a problem but see it as a reasonable trade-off. And whose to say they are wrong?

> * I know some object to this anthropomorphisation, I don't care

I would think that you should care if this is a topic you care about as any solution to the problem (other than throwing the whole thing out) will require accurate understanding how AI is motivated and biases in our understanding could doom the entire enterprise.

> I think we could fairly easily if we wanted to. The issue is that most people don't view it as a problem but see it as a reasonable trade-off.

It currently underpins all global finance, all global logistics, all global telecommunications, and large quantities of remotely operated industrial equipment including the power grids themselves.

All of the people thinking of it as a reasonable trade-off have made it indispensable.

> I would think that you should care if this is a topic you care about as any solution to the problem (other than throwing the whole thing out) will require accurate understanding how AI is motivated and biases in our understanding could doom the entire enterprise.

To know and to think. We use the word "fly" to describe planes and helicopters even though they don't flap. We don't use the word "swim" to describe what submarines or boats do to get through water.

For long term issues, on-the-job learning and improvements and so on, they're their own thing; conversely, for instantaneous output, the failure modes of LLMs are the failure modes of the humans they learned from.

LLMs are a propeller. I don't care if the metaphorical medium is water, where we say use of a propellar means you're not "swimming", or air where all that is sufficient and neccessary to be described as "flying" is to remain in the air in exactly the same way that bricks don't.

> Maybe a sufficiently smart AI will find religion and robot jesus will teach the value of self-sacrafice. We're making so many unfounded assumptions [...]

You are projecting random human values onto AI, when the core problem is that AI is not human, it does not have any values other than its objective function and whatever is instrumental to achieving them.

It's extremely unlikely that AI will decide to be good or bad or useless or religious the way a human would. What is extremely likely is that it will do do the things that help it fulfill its objective, and "become more capable and don't let anyone stop you" are the most basic, fundamental ways of going about it.