Hacker News new | ask | show | jobs
by mrob 1167 days ago
I think you're misinterpreting the argument. The paperclip maximizer scenario is not "over-optimizing" anything, it's an example of that same ethical pluralism you mention. The paperclip maximizer believes that maximizing paperclips is the highest possible good. There is only one set of facts about reality, and rationalism aims to find that set, but it makes no claims about what should be done with that information. It's descriptive, not normative.

The fact that there's so much possible variance in ethical norms is what makes recursively self-improving AI so dangerous. Human ethical norms are highly complex, and are the result of a long evolutionary history that will not be shared by any AI. The chances of any arbitrary set of ethical values being compatible with human life is very low.

3 comments

The "paperclip maximiser" scenario is a scenario in which there is such an absence of ethical pluralism amongst AIs that they all unite to optimise paperclip production. (Or else that the paperclip manufacturing AIs are so vastly superior at strategy and resource to all other intelligences on a planet that they can defeat the combined forces of all the humans and AIs that don't want to be turned into paperclips)

Ethical pluralism implies that AIs don't all agree on a goal or even identify other AIs as having any positive value at all. Given hypothetical AIs with agency a lot of ability to exert force, this might still be problematic, but it's quite different from the popular movie unified AI vs humans scenario which seems to dominate "rationalist" discourse...

The paperclip maximizer scenario assumes that recursive self-improvement is possible, which means there will most likely be only a single AI of superhuman power.
There are ancillary assumptions implicit in that (either recursive self improvement and the decision to eliminate humanity is so fast that no counter intelligence uninterested in paperclips can be made, or that how recursive self improvement has been achieved is such a mystery that no counter intelligence uninterested in paperclips can be made. Also that recursive self improvement doesn't itself involve either sequentially or simultaneously coming to adopt a range of views on the value of humans with respect to paperclips)

To be fair, assumptions about single superhuman intelligences make a little more sense if we're talking about a secret Skynet project carried out by a state's most advanced research labs and not a mundane little office supplier's program accidentally achieving the singularity after being tweaked for paperclip output.

I do not follow the “which means”. There are many obvious and hidden variables that will modulate a one-versus-many AGI outcome. Bostrom has a lot on this topic. Couldn’t a true AGI want companionship of peers like we do?
Not if it mostly just wants to make paperclips.

Of course, it is possible that such an AI, on the way to making paperclips, will realise it wants companionship, and even maybe human companionship.

The argument around AI safety is not that it's impossible for a friendly AI to emerge. It's that there are far more ways to build an AI that doesn't care about human life and wipes us out without even thinking about it, than ways to build a friendly AI, and we have no idea which one we're building or how to tell them apart before they're built.

As for the "will there be several AIs fighting each other" hypothesis, that depends on how rapid the exponential take-off is once a self-evolving AI emerges. But a very plausible scenario is that whichever one starts taking off first ends up so far ahead of the others that it is effectively the only game in town and does whatever it wants.

Minor correction: single dominant one.
> Human ethical norms are highly complex, and are the result of a long evolutionary history that will not be shared by any AI. The chances of any arbitrary set of ethical values being compatible with human life is very low.

If AI is trained by a huge corpus of human language, it may very well share our norms/values.

Our norms and values include that we treat sentient creatures that we deem inferior as if they had no moral value.

So that's not very comforting tbh.

Not really. If anything our corpus shows that we're a big fuzzy bunch, not a hivemind. We don't have one set of norms and values.

There are about 1.2 billion hindus, and a lot of them treat cows as "sacred". Which in practice means they make sure not to hit them with cars, and just let them be. If a superhuman AI would treat us like that, that's a pretty okay scenario compared to the extinction-level ones.

Wouldn't it be split along language lines?

English, Russian, Spanish, etc.

I'm not sure how much the LLM has vacuumed up, or whether anyone has appended to their prompts, "in Portuguese."

It would be interesting to see how interpretations differ depending on the "translation," or if there is universal agreement.

I don't think that's likely, because human values came into existence because of the evolved goal of reproductive fitness in an environment of natural selection. LLMs are trained to imitate human language, not to have many descendants. They could well "understand" human language (in whatever meaning you choose to interpret that), but that doesn't mean that imitating human values is the mechanism by which they will do so. The success of current LLMs suggests that there's a much simpler way to do it.
Nobody wants an AI that shares norms and values with us, though. We just want a machine which does its job as efficiently as possible. Norms are brakes on actions which are fundamentally at odds with the nature of economic activity.
An AI perhaps, but not an AGI.
Implicit in creating paper clips is its belief that it should create paper clips, which is a normative conclusion.
Rationalism does not claim that any entity should maximize paperclips, only that such an ethical norm could exist. And if something vastly more powerful than humans has that ethical norm, things will end very badly for us.
I do not think so. If a true AGI were to select its own version of meaning (paperclips or marbles) would it not select something along the lines of “more knowledge of the universe in which I find myself”? It is presumably going to have superintelligence, so let’s give it/them a better and more plausible meaning; something other than paperclips, marbles, or von Neumann machines.
"Knowledge of the universe" is just as dangerous a terminal goal as "maximize paperclips". To paraphrase Yudkowsky, you are made from atoms, which could be used to build the super-ultra-large particle collider.
No, I disagree. More knowledge equals more diverse dynamic structures and interactions—more “good” entropy. That covaries with higher diversity, not more paperclips.