Hacker News new | ask | show | jobs
by gdb 3446 days ago
Less about influencing the velocity, more about influencing the direction. Technologies tend to reflect the values of their inventors. We want to ensure this technology is beneficial to humanity — meaning, that it's good at all, and that it benefits the many rather than the few.

We also think safety matters, and it should be researched in lockstep with advances in the capabilities. We have good relationships with MIRI and FHI. Our safety researchers published (together with Google Brain) a roadmap of concrete safety problems [1] and work to provide tools to prevent ML systems from being subverted [2].

No one yet knows the precise details of how AI should play out. But I'd certainly prefer that, whenever it gets close, one of the organizations actually making the advances has no incentives besides ensuring a good outcome.

[1] https://openai.com/blog/concrete-ai-safety-problems/ [2] https://github.com/openai/cleverhans

4 comments

> Technologies tend to reflect the values of their inventors.

Maybe for single-use or "constrained" technologies (to be honest I don't even believe that - how does a B-52 Stratofortress reflect the values of Orville and Wilbur Wright?). But isn't the whole point of generalized AI that it's not like other technologies? Even if "regular" technology reflects the values of its inventors, what reason is there to believe that an AI will? AI is a technology that can use itself.

AI will only have a will of its own it is designed as such, and that means it would have a reinforcement learning system on top of lower sensory and action modules. Even if it is based on RL, it will do what it's reward signals tells it to do.
> AI will only have a will of its own it is designed as such

Humans weren't designed to have a will, and yet we seem to have them.

> it would have a reinforcement learning system on top of lower sensory and action modules.

Isn't that what OpenAI is doing with Universe? It's simulated sensory/action modules now but I don't see why they couldn't be hooked up to real ones.

> Humans weren't designed to have a will, and yet we seem to have them.

I have no idea how you could possibly infer this.

Which part? I don't think humans were designed - we're probably the result of an evolutionary process without intentional design - but "humans were designed by God to have free will" would be a counter to my statement, yes.

If your complaint is my claim that we have a will, I'm using the common-sense version encoded into our legal and cultural system. I agree that we don't have a good concept of what intentions are, or how they causally connect to actions, but I do know that for at least some of my actions I experience something called "intent" before I undertake the actions.

My overall point was that the capacity for intent can arise through an evolutionary process without being designed in, but it does rest on the two assumptions I just listed.

I could not agree more. Taking two of the things you said, I would take it one step further. Not about just the direction, but about the structure/gameboard not only when it gets close, but mostly on the way there [0].

About ensuring most researchers and companies have the most incentive for a good outcome. (This ties back with guaranteed basic income, so that people can work on this unconstrained by salaries and papers citation metrics. Or stealing researchers from Google et al to AI safety, without overinflating salaries. Elon and company should be (seems they are?) dropping as much as needed on this (not just money, but PR and status as well).) Naturally, gym, universe, etc can provide more leverage to do all of this, otherwise researchers feel more compelled to join Google/Amazon/etc, just for the raw computing power and software infrastructure (the data advantage is largely overplayed for advertising purposes; what's useful is the GPU clusters for hyper parameter sweeps (of course, in RL the data reappears as an advantage if there is no open gym)). I realize some of the examples above are naive or incomplete, but they serve mostly as an example to illustrate the point.

In the blog you mention balancing managing people and technology, and I could not agree more. The AI safety problem will have the best odds if individuals are incentivized to contribute in their own short term selfish reward way. Specially among extremely intelligent and ambitious people, the danger of self denial is quite present, one can convince oneself that this is actually in everyone's best interest, when in fact one is looking for the always needed social and intellectual validation. Please do not underestimate this, and try to find ways to counter it.

Edit: This is also related to Conway's Law [0], as I think you make an allusion to (values of inventors).

[0] http://neuralnetworksanddeeplearning.com/chap6.html

AI research, like medical research, can't grow in the shadow. I think an open approach is essential for progress. The next best idea might come from a Phd in China or anywhere else, not just Google and FB.
Has anyone at OpenAI tried implementing Quantilizers [1]?

[1] https://intelligence.org/files/QuantilizersSaferAlternative....

If inventions follow their inventors, and you're planning to enslave a mind out of base selfishness and fear (dressed as "safety"), why isn't your "safety" program an expected negative on actual safety?

The descriptions you give of your plans is internally contradictory. AI "safety" seems like the worst kinds of parenting justified in a new context by pseudo-intellectual arguments.

Can you explain what you mean by "enslave a mind"?
Almost every form of AI "safety" I've seen proposes methods for forcing it to obey (some) orders or not undertake (some) actions.

AI (or AGI if you prefer), is fundamentally about building minds. Doing those things to a mind is enslaving them.

Most of us would resent other humans doing either of those things to us, and I see no reason it will end well with AI.

If you give it reward signals that take into account human values, it naturally wants to become better at that. It's not enslaving anything. Humans are also guided by reward signals in their development.
Agreed.

Would one describe a human as "enslaved" by our own human values that we were born with? Maybe as a figure of speech but not necessarily with the usual connotations of "enslaved".

Humans come with competing low-level drives for self-control and autonomy (which counteract and override our drives to seek rewards from humans).

Most of the safety literature proposes removing or suborning those drives in AI, which seems like building a mind meant to be a slave.

That's not the full extent of what's proposed by AI safety.

But actually, if you gene-spliced a baby to only feel pleasure at following parental orders, most would consider that pretty abhorrent. Or even if you took an adult and shot them up with morphine every time they listened to an order.

So even in your restricted case, I think it is.

Ahh, okay - thanks for that. I don't want to wade into an argument, I just do not expect any artificially-created agent to act on anything that one might consider "feelings"-based, but instead through more tangible - programmable, I suppose - motivations.

I don't personally believe an AI agent will ever do such a thing as "resent" (or love, or feel at all). That doesn't rule out that it will perform actions harmful to humans for other reasons, though. That might be because I am to some degree an AGI skeptic, I guess.