| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by creer 960 days ago

My point is mostly the agenda. I can see a machine having an agenda - even if that agenda is not human or not even understandable. You can call it reward function but that's giving a lot of credit to programmers - which most likely are too far removed from the agenda. Is the machine just answering questions? Well no. If it has cycles to talk to itself (or to two buddies) in the course or pursuing scientific research then perhaps this becomes the agenda (to the expense of other things). That's part of the point: IF the machine develops an agenda then what?

But "knowing best" could be a problem anyway.

And I expect that if we spend a few more minutes we can think of other ways for the situation to go "oops". Oh here is one: two humans / human entities conflicting on giving instructions. Machine soon enough "on its own".

So that I don't think "more specific alignment" can cut it - if we posit a super-human AGI with ways to act on the world. It would have to be more fundamental. Because of the issue that - at some point - one oops is not recoverable. Three laws or something? Heh.

1 comments

lamerose 960 days ago

Ok, those are some good points about what can go wrong. I still doubt that things are particularly more prone to going wrong in more intelligent systems. Wasn't it early, simplistic systems like Tay that went the furthest off the rails? The problem is that more intelligent AI will be used more ambitiously, so when it does go wrong, the consequences might be more serious than some racist twitter posts.

link

creer 960 days ago

Right. Hedge fund going global threat? That wasn't purely a machine. But none of this needs to be purely a machine. And it sure got far before people reined it back in.

And I don't know that "more intelligent" is necessary. I can see plenty of mirth coming from an amateur or hacker / techies group or less responsible country (hah!) using whatever commercial offer to bake their own agent. What's harder? The core. Hooking up the core to a wallet, internet access, a robo-signing staff - and working around the fine print of the core vendor - that might be much easier than what OpenAI and friends are trying to do. Do they also create their own reward function and alignment in there? Yes. That's part of the fun. That's the point. Do they get it right? Maybe, maybe not.

link