Hacker News new | ask | show | jobs
by nutjob2 2891 days ago
It seems the fundamental problem with bottom up/learning AI is that it is opaque and essentially unknowable. I find it all very hackish. We can develop systems now which we can test and seem to work, but we don't know exactly why they work (eg: what parts of the training data they are promoting) and when (or why) they will fail. The effectiveness of adversarial inputs to trained vision systems illustrates this.

Zoom forward to a super-human AI that mimics our brains in its approach but exceeds its capacity. What is stopping it, for instance, learning that it can play the long game of being good until it has sufficient power at its disposal and then becoming evil? No matter what training data you present, you can't know exactly what the result will be.

I get the feeling that learning systems will be combined with model systems with the former performing "low level" tasks and the latter providing a verifiable "executive" that guides high level goals or outcomes.

1 comments

One approach being considered is "AI Safety Via Debate"[0], which hopes to prevent deception by carefully constructing games in which a superhuman agent's best strategy is honesty. Note that this is the goal; much work to be done!

[0] https://arxiv.org/abs/1805.00899

Forget AIs - we need this for humans to design legal and administrative systems.

I have pondered if it would be a workable field to have incentive based design in a formalized way to ensure that even a complete sociopath would find acting in a beneficial way the best option.

Do we know the entire game theory well enough so that we can structure such games with no theoretical way for AI to sneak out? I doubt that, but even so, funny things start happening when theory meets practice. I recall the example of quantum entanglement, which (I read) enables communications that cannot be spied upon without the intended parties knowing. Except, (I also read) it was attacked at the interface between quantum and classical domain. The world is complex, and superhuman AI is by definition better equipped to find loopholes than humans are.
Unfortunatley being dishonest or evil is just one example. Arguably the AI can develop new classes of deviancy, abuse or maladaptation that we haven't conceptualized yet. We supersize the ability, surely we supersize the problems.

It leads to a scary question: what does a superhuman AI really want?

To be fair a HFT agent can count as superhuman AI technically. Wanting isn't a thing that applies yet to actual AI and there is no special sauce that indicates advancement beyond neuron scale. Barring directives and assuming "grown" what it wants can be utterly peripheral to rationality and likely based on what it is taught - internationally or not. Look at how society preaches honesty from a young age and then starts teaching lying again by rewarding it. The real lesson is the spartan one on stealing- don't get caught. It may not be intended but it is the result.
What does a human that is much smarter than you really want? It's a fundamental philosophical problem that hasn't been solved.
> which hopes to prevent deception by carefully constructing games in which a superhuman agent's best strategy is honesty

I'd be very hesitant to assume that an agent cannot learn under which circumstances it should be honest to gain a benefit without putting any innate value on honesty. A human agent is more than capable of reasoning like that, let alone a superhuman one.