| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fiatmoney 3651 days ago

These are not asking the right questions, although they kind of hint at it, and they are not fundamentally questions about AI. Example: "Can we transform an RL agent's reward function to avoid undesired effects on the environment?" Trivially, the answer is yes; put a weight on whatever effect you're trying to mitigate, to the extent you care about trading off potential benefits. They qualify this by saying essentially "... but without specifying every little thing". So - what you're trying to do is build a rigorous (ie, specified by code or data) model of what a human would think is "reasonable" behavior, while still preserving freedom for gordian knot style solutions that trade off things you don't care about in unexpected ways.

The hard part is actually figuring out what you care about, particularly in the context of a truly universal optimizer that can decide to trade off anything in the pursuit of its objectives.

This has been a core problem of philosophy for 3000 years - that is, putting some amount of rigorous codification behind human preferences. You could think of it as a branch of deontology, or maybe aesthetics. It is extremely unlikely that a group sponsored by Sam Altman, whose brilliant idea was "let's put the government in charge of it" [1], will make a breakthrough there.

I don't actually doubt that AIs would lead to philosophical implications, and philosophers like Nick Land have actually explored some of that area. But I severely doubt the ability of AI researchers to do serious philosophy and simultaneously build an AI that reifies those concepts.

[1] http://blog.samaltman.com/machine-intelligence-part-2

2 comments

argonaut 3651 days ago

You're dismissing the paper for not asking the right questions, but you don't propose any questions that you think are better.

> The hard part is actually figuring out what you care about, particularly in the context of a truly universal optimizer that can decide to trade off anything in the pursuit of its objectives.

This seems basically equivalent to what they are saying. A reward function that rewards "what we actually care about." This might seem vague, but that's fine because these are only proposed problems.

link

akvadrako 3651 days ago

I'm not sure what point you are trying to make. It's possible to dismiss an idea without providing an alternative. Yes, finding a reward function is equivalent to figuring out what we care about. Both are about as hard as teaching a bacteria to play piano.

The goal is avoiding unsafe AI. The reason such pointless efforts are wasted on this approach is we don't have a good alternative. The only thing I can think of is delaying it's creation indefinitely, but that's also a difficult challenge. For example, in the Dune books, the government outlaws all computers. That might work for a while.

link

argonaut 3651 days ago

Let me elaborate. It is easy easy easy to nitpick and find holes in someone's proposals, someone's problem statements, and someone's goals in life.

Statements are adding noise and less than nothing of value if they just consist of telling people they are working on the wrong thing... and not proceeding to tell them what they should instead be working on, and giving clear positive reasons why (instead of negative reasons someone should not be working on something).

Incidentally this is a broader problem with HN discourse.

link

stevetrewick 3651 days ago

From my cold, dead hands.

link

GuiA 3651 days ago

> You're dismissing the paper for not asking the right questions, but you don't propose any questions that you think are better.

One might wish to point out that the emperor has no clothes and yet have no desire to plan his majesty's outfits for the next 6 months.

link

argonaut 3651 days ago

A witty saying proves nothing. See my reply to the other comment.

link

fiatmoney 3651 days ago

I'm saying they're modelling the problem incorrectly as a CS endeavor when it has a lot more to do with analytic philosophy.

"How do I make a program make beautiful music" is a CS problem, but only after you have some notion of aesthetics in the first place.

In the context of a universal optimizer, "how do we make this program behave reasonably without bad side effects" is maybe a CS problem, but it's predicated on "how do we codify our notion of reasonable behavior", which is analytic philosophy with probably a bit of social science thrown in.

Problem-posing is itself difficult and how a lot of philosophical breakthroughs are made. If you want rigorous problem-posing where the solution would be handy for AI, hiring a philosopher might be a good start. Very few of us are equipped to do this kind of work, certainly not here in the comments section.

link

marvin 3651 days ago

I think you've hit the nail on the head here, this feels to me like one of a few responses in this thread that gets to the core of the issue if we're thinking about long-term AI risk.

In fact, I'm surprised that there doesn't seem to be any reference in the article to previous work on these philosophical implications, e.g. the stuff that has been written by Nick Bostrom or MIRI. Perhaps there are some in the paper?

I think that for the forseeable future, we will inevitably end up with two of the problems that various philosophers have outlined over the last few years:

(1) How do we ensure that an AI agent does exactly what we want it to do and

(2) What do we ultimately want if we can desire anything?

I think that any developer trying to approach this will be doomed to hack around these two issues. We can probably come a long way in AI capabilities without having the optimal solution to this, but the core problem will remain for a long time and haunt those who are cautious.

link

argonaut 3651 days ago

1) Is exactly what this paper is addressing. The fact that their is philosophical ambiguity is precisely why these are problems, and not solved already.

link