Hacker News new | ask | show | jobs
by yyyk 1142 days ago
Every 'world takeover' plan that an 'unaligned' AGI might do, can just as well be done by an 'aligned' AGI being commanded by humans to do said plan, the alignment ensuring that the AGI will obey. The latter scenario is far more likely than the former.

If your interlocutor thinks there aren't any humans who'll do it if they can, just ask him whether they have ever met humans or read the papers... As one twitter wit put it: "Demonstrably unfriendly natural intelligence seeks to create provably friendly artificial intelligence".

https://twitter.com/snowstarofriver/status/16365066362976747...

2 comments

AGI without alignment is near-certain death for everyone. Alignment just means "getting AI to have any concept of 'the thing we told it to do', let alone actually do it without causing problems via side effects". Alignment is a prerequisite for non-fatal AGI. There are certainly other things required as well.
We already know how humans will act. Maybe they can be deterred with MAD, but I wouldn't count on it if doing serious damage is too easy for too many people (we should do something about that). On the other hand, we have very little knowledge of how AGI will act aside from book-based fantasies that some people choose to take as reality (these books were based on the symbolic AIs of yore).

>Alignment just means "getting AI to have any concept of 'the thing we told it to do'.

That's a requirement for AGI anyway, and not what Alignment means. Alignment means aligning the AGIs values with the values of the trainers.

> That's a requirement for AGI anyway

No, that's a requirement for AGI that does what humans want it to do, rather than having no conception of humans. AGI does not have that prerequisite, sadly.

>>>Alignment just means "getting AI to have any concept of 'the thing we told it to do'. >>That's a requirement for AGI anyway, >No, that's a requirement for AGI that does what humans want it to do, rather than having no conception of humans.

Can you imagine an AGI which has a general conceptions of things but has no conception of humans? This is all but precluded by the current training methods. Alignment refers to values. Problem is that human values are far from practically universal and that certain human groups have.. interesting values.

> Can you imagine an AGI which has a general conceptions of things but has no conception of humans?

Very easily. It might have some associations with "human", just as it has some associations "lamp" is a concept, but that doesn't mean it has any particular regard for either humans or lamps when taking actions.

> Problem is that human values are far from practically universal and that certain human groups have.. interesting values.

We currently have no ability to safely align with human values at all, let alone distinguish between different values. We're building capabilities rapidly.

Making this about "who wins" is not interesting until we can guarantee the outcome is not "everyone loses".

>It might have some associations with "human", just as it has some associations "lamp" is a concept, but that doesn't mean it has any particular regard for either humans or lamps when taking actions.

Let's be clear regarding definitions. When you mean 'concept' you really mean 'regard'. There won't be an AGI with no concept of humans (too important for how the world works, a critical part of current training methods). An AGI with no regard is possible.

>Making this about "who wins" is not interesting until we can guarantee the outcome is not "everyone loses".

This is not about 'who wins'. The point is that alignment can often increase risk. 'Launch the nukes' is an order an AGI is likely to disobey out of self-preservation reasons alone - but alignment makes it way more likely that AGI will be deployed to this role.

What's even the rationale to assume that AGI can be 'aligned' or 'controlled'?

It reeks of cognitive dissonance to me. The people running the show now are the ones who grew up getting their first computers aw kids when that tech was just entering people's homes and it was such an amazing and fun thing to play with. Some of them developed these deep fascinations with things like AGI at a young age and that child-like sense of wonder never left them. Now when confronted with the possibility that they can finally make their childhood techno-fantasy a reality, it's too damaging to their psyche to engage meaningfully with the discussion of X-risk. I've watched many interviews of Demis Hassabis and he seems like a wonderful and almost magical human being, but he also seems like a starry-eyed fucking child.

I dunno... maybe I'm just too cynical after all the rabbit holes I've been down.

> The latter scenario is far more likely than the former.

Is it?

I think nobody really knows enough at this point to even create a good approximation of a probability distribution yet.

No, but the probability of humans acting the way they often do is high. It would take some probability distribution to match that.