Hacker News new | ask | show | jobs
by CuriousSkeptic 3816 days ago
How about positive feedback loops? Could we not employ ai-technology as an aid in designing, and implementing, benevolent ai?
2 comments

That strikes me as sort of circular in reasoning. To create benevolent AI using AI, the latter would already have to know what 'benevolent' means. In which case we might as well just create benevolent AI in the first place.

In fact, by relying on a secondary AI it's quite possible we're even more likely to accidentally end up with AI that isn't benevolent.

Ok, bad choice of words. How about augmented reasoning capacity enabled by technologies involved in creating AI.
We could, maybe, for various degrees of help and with various dangers depending on how much help (look up Oracle AI), and I would easily wager using relevant tools (not just "AI tech") would make the positive outcome more likely than not using them, but there are still many ways it can go wrong, and while past and present weak-AI technology might be useful in creating a sentient machine I'm doubtful much would be that helpful in solving the Friendliness (or benevolence) problem directly.

For instance the other comment brings up the uncertainty around giving an Oracle AI the task of finding what benevolence means to humans, to then plug into the final AI hoping it will be Friendly. You don't know how to precisely define 'benevolent' so let the Oracle AI do it (or help you do it), though you think you can at least program the Oracle AI to do a good job in finding answers to vague, complex, fragile problems. (Why? Past success in AI tech?) Is what the Oracle AI outputs actually good? How do you know? What did the Oracle AI have to do to reach its answer, or gain a last bit of certainty in it? (e.g. Did it reason it needed more data and so built or hastened the arrival of brain-scanning technology, started scanning brains from volunteers, the very recently deceased, cryonics patients, or just without its creators knowledge through black market deals, and at some point did it simulate trillions of human minds under all sorts of gruesome scenarios to test responses? Do you even morally care about sentience on silicon? Would you care more if you were an upload?) You've got a lot of problems just building an Oracle AI (or anything powerful enough that can help you solve the hardest problems). Even supposing the output it gives is good and the costs (practical and ethical) were acceptable, will a seed AI allowed to recursively self-improve be guaranteed to preserve this Friendliness property in all subsequent improvements?

With the idea of using "AI tech", you've basically drawn a line in the outcome space that on one side says something like "let's just program the thing" and the other says "hold on, let's just program the thing but also use all these other relevant computer tools we've developed over the decades to help program the thing". It's the start of an approach, but does it really prune all that much? What do these relevant tools actually buy you in safety, if a safety-guarantee-tool is not among them? Another line which may be what you were heading towards, could be on one side say something like "we might not be able to solve all the necessary problems with our current intelligence, so let's also spend time looking for ways to augment human intelligence -- brain-computer interfaces, brain emulations, tweaks to our genetic code, that sort of thing" and the other side "no, we're capable, let's try our best right now". Using ems based on friendly-ish human minds that become better than human could very well be a better approach to solving the superintelligent problems correctly and efficiently than doing so on raw human levels, on the other hand it could turn very bad if one of those ems instead just solves the problem of recursively improving themselves and their goal and value system is insufficiently friendly. Or many other failure modes. (Though an interesting "most likely scenario" given certain assumptions and only looking at a span of 2 years which by itself doesn't look too terrible is Robin Hanson's Age of Em idea.) Just like approaches involving molecular nanotech to help us, there's reasons why it could be seen to help the best outcome along, but it also opens the door to a lot of other risks that aren't necessarily there if you took the other side, so again how much is really pruned?

What I take as your general idea of "maybe we need and should employ additional help and knowledge that we've yet to gain to even really start" isn't bad, though, much better than "Benevolence is easy, a King and his subjects all know he is a great ruler when everyone is smiling frequently", it's just lacking in detail to constitute a plan. Throwing in the positive feedback loop idea (which might even manifest as impressive accelerating returns, who knows) to me seems only relevant to figuring out time scales, I don't think it says anything about the fraction of good vs. bad outcomes apart from whether short or long time scales matter.

Since I'm most likely will have no impact on any plans I'm only hoping for some optimistic predictions (a plausible bright future) and what mechanics would be involved. So no, I'm not expecting an air tight plan, just a hint at how it might look.

My general take is that, AI or not, the world is continuing to progress towards a future where extinction level capabilities are not only within reach, but also within reach for more and more people. It is, to me, not a question of if, but when, such capabilities will be in the hands of an individual, or organization, that either by malice, or incompetence, would be a serious problem.

Last year a pilot intentionally crashed a plane in act of sucicde, taking 150 passengers with him. Consider a likeminded person capable of engineering a virus.

So, my optimistic take on the future is that while such capabilities are developed, one would hope, that in parallel capabilities to mitigate those threats are also developed, so that when the problems arise the combined capabilities of all benign actors is sufficient to evade extinction.

In the case of AI it just seems implausible that any debate could manage to stop the technological race. Just look at the climate change debate, it has been going on for decades and is just gaining political traction. AI isn't even on the radar yet...

So the plausible futures too me are either that AI will wipe us out, or not. In the event that it does not I think one plausible future is that the potentially malign AI will operate in an environment where other AI, or near AI-systems, will also perceive it as a threat. Hopefully aggregating into the capacity to suppress the dangers in a benign way.

Also, I believe any path from here to AI will consist of a complex system of AI-like technology interwoven with human systems resulting in some aggregates with considerable capablites itself not to be discounted.

In some sense I believe we can extrapolate how things will play out just by looking at some systems that might resemble AI today. Consider corporations, these are actors created from a complex system of legal, economic, social and other constructs. While ostensibly owned by, and operated by, humans, the reality is that there is little any individual human can influence them in view of all other forces directing their actions, making them, in a sense, allready existing, AI:s.

Creating actual AI, as a useful thing, will probably involve similar complexity and influences. So in some sense there are probably similar patterns that will play out. Just as corporations can act to influence the world in ways malign to humans, the AI systems would. And just as corporations, the AI would find it self acting in an environment doing its best to contain the malice.