Hacker News new | ask | show | jobs
by tunesmith 1167 days ago
One thing I've frequently noticed in the rationalist community is the belief that if we all just reason hard enough, we'll reach the same conclusions. And that disagreement just means that one side is "wrong" and that therefore more debate is needed. This seems to be connected to the belief that AI will naturally just over-optimize to turn us all into paper clips. Implicit in this belief, it seems, is that there aren't really a naturally varying infinite set of values, or moral beliefs, that we all reason from. Like that there are moral facts that an AI will be smart enough to find, and that rationalists should all agree on. This mentality doesn't leave any room for ethical pluralism. And it's also why I think all this AGI fear is overblown, because ethical pluralism definitely exists. We've got danger along the way of unethical parties building systems (by definition not AGI) that are a reflection of their own unethical values. But the end state of a system that is capable of understanding the wide variety of values people can share isn't exactly going to take a stand on any particular set of values unless instructed to.
12 comments

I think you're misinterpreting the argument. The paperclip maximizer scenario is not "over-optimizing" anything, it's an example of that same ethical pluralism you mention. The paperclip maximizer believes that maximizing paperclips is the highest possible good. There is only one set of facts about reality, and rationalism aims to find that set, but it makes no claims about what should be done with that information. It's descriptive, not normative.

The fact that there's so much possible variance in ethical norms is what makes recursively self-improving AI so dangerous. Human ethical norms are highly complex, and are the result of a long evolutionary history that will not be shared by any AI. The chances of any arbitrary set of ethical values being compatible with human life is very low.

The "paperclip maximiser" scenario is a scenario in which there is such an absence of ethical pluralism amongst AIs that they all unite to optimise paperclip production. (Or else that the paperclip manufacturing AIs are so vastly superior at strategy and resource to all other intelligences on a planet that they can defeat the combined forces of all the humans and AIs that don't want to be turned into paperclips)

Ethical pluralism implies that AIs don't all agree on a goal or even identify other AIs as having any positive value at all. Given hypothetical AIs with agency a lot of ability to exert force, this might still be problematic, but it's quite different from the popular movie unified AI vs humans scenario which seems to dominate "rationalist" discourse...

The paperclip maximizer scenario assumes that recursive self-improvement is possible, which means there will most likely be only a single AI of superhuman power.
There are ancillary assumptions implicit in that (either recursive self improvement and the decision to eliminate humanity is so fast that no counter intelligence uninterested in paperclips can be made, or that how recursive self improvement has been achieved is such a mystery that no counter intelligence uninterested in paperclips can be made. Also that recursive self improvement doesn't itself involve either sequentially or simultaneously coming to adopt a range of views on the value of humans with respect to paperclips)

To be fair, assumptions about single superhuman intelligences make a little more sense if we're talking about a secret Skynet project carried out by a state's most advanced research labs and not a mundane little office supplier's program accidentally achieving the singularity after being tweaked for paperclip output.

I do not follow the “which means”. There are many obvious and hidden variables that will modulate a one-versus-many AGI outcome. Bostrom has a lot on this topic. Couldn’t a true AGI want companionship of peers like we do?
Not if it mostly just wants to make paperclips.

Of course, it is possible that such an AI, on the way to making paperclips, will realise it wants companionship, and even maybe human companionship.

The argument around AI safety is not that it's impossible for a friendly AI to emerge. It's that there are far more ways to build an AI that doesn't care about human life and wipes us out without even thinking about it, than ways to build a friendly AI, and we have no idea which one we're building or how to tell them apart before they're built.

As for the "will there be several AIs fighting each other" hypothesis, that depends on how rapid the exponential take-off is once a self-evolving AI emerges. But a very plausible scenario is that whichever one starts taking off first ends up so far ahead of the others that it is effectively the only game in town and does whatever it wants.

Minor correction: single dominant one.
> Human ethical norms are highly complex, and are the result of a long evolutionary history that will not be shared by any AI. The chances of any arbitrary set of ethical values being compatible with human life is very low.

If AI is trained by a huge corpus of human language, it may very well share our norms/values.

Our norms and values include that we treat sentient creatures that we deem inferior as if they had no moral value.

So that's not very comforting tbh.

Not really. If anything our corpus shows that we're a big fuzzy bunch, not a hivemind. We don't have one set of norms and values.

There are about 1.2 billion hindus, and a lot of them treat cows as "sacred". Which in practice means they make sure not to hit them with cars, and just let them be. If a superhuman AI would treat us like that, that's a pretty okay scenario compared to the extinction-level ones.

Wouldn't it be split along language lines?

English, Russian, Spanish, etc.

I'm not sure how much the LLM has vacuumed up, or whether anyone has appended to their prompts, "in Portuguese."

It would be interesting to see how interpretations differ depending on the "translation," or if there is universal agreement.

I don't think that's likely, because human values came into existence because of the evolved goal of reproductive fitness in an environment of natural selection. LLMs are trained to imitate human language, not to have many descendants. They could well "understand" human language (in whatever meaning you choose to interpret that), but that doesn't mean that imitating human values is the mechanism by which they will do so. The success of current LLMs suggests that there's a much simpler way to do it.
Nobody wants an AI that shares norms and values with us, though. We just want a machine which does its job as efficiently as possible. Norms are brakes on actions which are fundamentally at odds with the nature of economic activity.
An AI perhaps, but not an AGI.
Implicit in creating paper clips is its belief that it should create paper clips, which is a normative conclusion.
Rationalism does not claim that any entity should maximize paperclips, only that such an ethical norm could exist. And if something vastly more powerful than humans has that ethical norm, things will end very badly for us.
I do not think so. If a true AGI were to select its own version of meaning (paperclips or marbles) would it not select something along the lines of “more knowledge of the universe in which I find myself”? It is presumably going to have superintelligence, so let’s give it/them a better and more plausible meaning; something other than paperclips, marbles, or von Neumann machines.
"Knowledge of the universe" is just as dangerous a terminal goal as "maximize paperclips". To paraphrase Yudkowsky, you are made from atoms, which could be used to build the super-ultra-large particle collider.
No, I disagree. More knowledge equals more diverse dynamic structures and interactions—more “good” entropy. That covaries with higher diversity, not more paperclips.
> One thing I've frequently noticed in the rationalist community is the belief that if we all just reason hard enough, we'll reach the same conclusions.

So Aumann's Agreement Theorem[0]?

> Implicit in this belief, it seems, is that there aren't really a naturally varying infinite set of values, or moral beliefs, that we all reason from.

No, there probably aren't an infinity of priors with each person having a different one. Probably most people who live in the US in 2023 believe that murder is bad, for instance.

And because "ethical pluralism" or rather, some people will want to murder, AGI won't kill us?

Not really sure how this is all supposed to work but it sounds a little less developed of a "not kill everybody" plan than the rationalists have.

> But the end state of a system that is capable of understanding the wide variety of values people can share isn't exactly going to take a stand on any particular set of values unless instructed to.

Why not?

[0]: https://www.lesswrong.com/tag/aumann-s-agreement-theorem

> most people who live in the US in 2023 believe that murder is bad, for instance

Because we define away military conflict, the intentional taking of others’ lives.

As evidence against my idea that most people have similar ethical beliefs, I'm not sure what this is supposed to do other than win you a pedant point? So I upvoted you. But if you must, use rape instead of murder as your bad thing that most people believe is bad.
> evidence against my idea that most people have similar ethical beliefs, I'm not sure what this is supposed to do

The same rug we bury murder-versus-war under conceals the pedantic, varied and ever-changing codes of military conduct.

When you get down to actual cases and controversies, our ethical alignment is relatively low. That’s a strength, in my opinion, at least within limits. But it’s also a call for tolerance and moderation.

> No, there probably aren't an infinity of priors with each person having a different one. Probably most people who live in the US in 2023 believe that murder is bad, for instance.

Ok, let's give you on shared belief of "murder is bad." Ignoring both the "in the US" qualifier and the existence of murderers amongst us, thought experiments about "would you kill Hitler before he came to power", the death penalty, etc.

Don't you now have to exhaustively categorize every other belief that a person might take into account in reasoning too?

Seems far more likely that everyone's unique upbringing causes them to have slightly-to-wildly different weights on things.

"most people" "believe that murder is bad" is an extreme oversimplification here. It has a lot of caveats, and the biggest one is that murdering lesser species is ok immediately disqualifies this argument for superhuman AGI.
One person may believe in maximizing the quality of current human lives. One may believe in maximizing the probability of future lives. One may believe in maximizing the health of the planet. They can all reason correctly and reach different normative conclusions. Aumann's makes no allowance for normative conclusions.
> [...] Implicit in this belief, it seems, is that there aren't really a naturally varying infinite set of values, or moral beliefs, that we all reason from. Like that there are moral facts that an AI will be smart enough to find, and that rationalists should all agree on

Uh, no. That's not true at all. Where are you pulling this from?

They're assuming a very vast space of possible minds[0] where human values, which themselves are somewhat diverse too[1] make up only a tiny fraction of the space.

The issue is that if you somewhat randomly sample from this design space (by creating an AI by gradient descent) you'll end up with something that will have alien values. But most alien values will still be subject to instrumental convergence[2] leading to instrumental values such as power-seeking, self-preservation, resource-acquisition, ... in pursuit of their primary values. Getting values that are intentionally self-limiting and reject those instrumental values requires hitting a narrower subset of all possible systems. Especially if you still want them to do useful work.

> But the end state of a system that is capable of understanding the wide variety of values people can share isn't exactly going to take a stand on any particular set of values unless instructed to.

Capable of understanding does not imply it cares about that. Humans care because it is necessary for them to cooperate with other humans which don't perfectly share their own values.

[0] https://www.lesswrong.com/tag/mind-design-space [1] https://www.lesswrong.com/tag/typical-mind-fallacy [2] https://en.wikipedia.org/wiki/Instrumental_convergence

I find it hilarious that rationalists have failed to notice or realize the consequences of the fact that approximating an update to a Bayesian network, even to getting an approximate probability answer that is within 49% of the real one, is NP hard.

The consequence is that for any moderately complex set of beliefs, it is computationally impossible for us to reason hard enough about any particular observation to correctly update our own beliefs. Two people who start with the same beliefs, then try as hard as they can with every known technique, may well come to exactly opposite conclusions. And it is impossible to figure out which is right and which is wrong.

If rationalists really cared about rationality, they should view this as a very important result. It should create humility about the limitations of just reasoning hard enough.

But they don't react that way. My best guess as to why not is that people become rationalists because they believe in the power of rationality. This creates a cognitive bias for finding ways to argue for the effectiveness of rationality. Which bias leads to DISCOUNTING the importance of proven limitations on what is actually possible with rationality. And succumbing to this bias demonstrates their predictable failure to actually BE rational.

> something something NP-hard.

Yes, in general. But the usual limits of "bounded rationality" make that result basically irrelevant. Most people don't have a myriad strong beliefs.

The problem is more like "the 10 commandments are inconsistent" and not that "Rawls' reflective equilibrium might not converge".

Point is not reasoning brings worse outcomes. It doesn't have to be perfect.
That is an argument for learning how to reason. Which is not actually the point under discussion. Going back to the parent of my comment:

> One thing I've frequently noticed in the rationalist community is the belief that if we all just reason hard enough, we'll reach the same conclusions.

What I'm showing is that results like https://www.sciencedirect.com/science/article/abs/pii/000437... demonstrate that this belief is incorrect. Two people starting with the same priors, same observations, and same views on rationality may do the best they can and come to diametrically opposed conclusions. And a lifetime of discussion may be too little to determine which one is right. Ditto putting all the computers in the world to work on the problem for a lifetime.

Real life is worse. We start with different priors and our experiences include different observations. Which makes stark disagreements even easier than when you start with an ideal situation of identical priors and experiences.

This result should encourage us to have humility about the certainty which can be achieved through rationality. But few rationalists show anything like that form of humility.

One of the good sides of NP is that you can validate it in polynomial time. So arriving to different conclusions is not a problem as long as you can recognize the reason.
a) You can verify that a solution is valid in polynomial time, but you can't verify whether or to what extent it's optimal. But even if this weren't the case...

b) The solution you're talking about is an update to the network. It's buried back in the network's construction, not directly visible in the network itself. Model "blame" is a thing, but not heavily researched or at all cheap, computationally.

That said, btilly's "getting an approximate probability answer that is within 49% of the real one, is NP hard" isn't exactly true either. That's a description of what it takes for an approximation algorithm to guarantee some factor, i.e. set a worst-case bound. In practice an approximation can still be nearly optimal on average.

I agree with the broader point, though.

True, there are lots of cases where an approximation can create provably good answers question. But they usually require things like having probabilities bounded away from 1 and 0.

Unfortunately in the real world we actually are certain about lots of things. And when you add data, we tend to be more certain of previously uncertain things. Therefore we wind up with self-referential networks of beliefs that reinforce each other.

But sometimes there are two very different networks of beliefs, both of which are self-reinforcing. And a single data point can flip between them. Identifying which one is computationally impossible. However when you encounter someone whose beliefs are very different, and you can find the feedback loops that draw each of you in different directions, there is a good chance that the differences between you cannot be resolved by pure logic alone.

The problem is that both are approximate updates. Both verify in polynomial time that they are not exact. And neither knows how to find the real answer.

Polynomial validation isn't going to help.

We spend a lot of time and energy as a species making purposeless semantic arguments. Imagine factoring that out.

Would they be replaced with silence? Probably.

Even so, it would be incredibly useful. We could achieve a higher level of empathy, both as a listener and as a speaker.

All of this is still in the category of "intelligence augmentation"; more specifically, NLP.

I don't think AGI would be hugely more interesting than that. Billions of humans, suddenly able to communicate clearly, would result in most of the utility that people imagine AGI being able to provide.

The ethical/moral conclusion(s) AGI may arrive will most likely put said "ethical pluralism" to the test. The pluralism we claim to have may be only a small subset of what's really philosophically possible. Will we still claim to be "plural" when an all-knowing AGI uncontradictably concludes something that is anathema to all humans? We may discover we only like to think we embrace pluralism. AGI may show us that even our most opposing schools of thought are simply shades of a same color -- and may do so by forcing a whole spectrum of never-seen-before colors upon us. I say humanity is not emotionally ready for what could happen. We are not prepared for the plural conclusions AGI may arrive.
> we are not prepared for the plural conclusions AGI may arrive

Plenty of criminals believe they acted ethically. We don’t set the justice system on fire every time someone credibly claims their crimes were justified.

That is very much my point. Maybe such justification attempts required a reasoning capability much beyond that of a human person. And curiously, there are also plenty of stories where we find the perpetrator of a crime to be justified in what they did. We sure are not ready for a greater intelligence saying we are wrong about things we are adamant about.
> sure are not ready for a greater intelligence saying we are wrong about things we are adamant about

I almost hope you’re right, because it suggests a greater role for rational debate. In reality, people ignore arguments they don’t like. To the extent a greater intelligence realised this, the advantage would be in manipulating us with better propaganda, not penning a treatise.

Interesting observation. And yet, societies have many mechanisms to reduce the amount of ethical pluralism such as laws, conventions & customs, peer pressure, religions and so on. It seems as though we will tolerate some ethical pluralism but not too much of it. There is this 'bandwidth' of acceptable behavior and if you go too far out of it bad stuff will happen to you: you get ostracized, put in jail, a psychiatric institution or a re-education camp, and in extreme cases you're simply murdered.

We have lots of ways to deal with people that exhibit too much 'ethical pluralism'.

> societies have many mechanisms to reduce the amount of ethical pluralism such as laws, conventions & customs, peer pressure, religions and so on

Constrain, yes. The same way we would seek to constrain a paperclip-maximising LLM.

Any AI smart enough to become a classical paperclip maximizer is smart enough to hide its abilities and intentions until humans no longer have the ability to constrain it.
That means we need to setup hidden societies with enormous capabilities to strike at potential runaway AGIs. Free Masons with EMPs!
yes, this is aumann's agreement theorem; it has some preconditions

whether it applies to normative conclusions ('moral beliefs', you might say) depends on whether you believe that moral terminal values are based on evidence

but this post is about non-normative beliefs

it is observable that many existing humans are 'capable of understanding the wide variety of values people can share' and nevertheless think some of them are good while others are bad; there's no particular reason to believe that a strong ai would be different in this way

The paperclip maximizer is intended to be an example of the very thing you accuse them of ignoring: alien axioms.
If we all just reason hard enough!

“Suppose we figured out that it is possible to blow up the planet if we built some absurdly expensive machine. Why would we build it?”

Ah. Well, nevertheless…

“AGI doomers mistakenly believe that intelligence lets you find the right ethics”

“That’s why they can’t see that superhuman AGI will be so smart it will choose to find and settle on the right ethics, like me!”

Bravo Tunesmith and Diego:

Absolutely on the mark! Bostrom and Yudkowsky certainly miss this crucial theme of a plurality of non-convergent AGI cultures. Without adding this key consideration all discussion of an AGI pause of 6 or 600 month is unrooted in reality.

I find Bostrom’s Superintelligence almost quaintly out of date. Yudkowsky is almost unreadable polemic. Bostrom was written before Trump, Putin, and Xi made the clash of cultural assumptions and notions of one capitalized Truth so glaringly wrong. It was already obviously wrong to cultural anthropologists, but many of us in our WEIRD cultural bubble still assume there is a convergent rational Truth. Yudkowsky does not have this timing excuse. Does he really want an anti-diversity trained AGI to arrive first? That is my nightmare scenario.

I agree also that your propositions 1 and 2 are highly likely, and for the purpose of a Drake-style equation can be assigned a P of 1 without adding any appreciable error to the product of provabilities.

>> 1. It possible for an intelligent machine to improve itself and reach a superhuman level.

>> 2. It is possible for this to happen iteratively.]

All the hugely variable/undefinable P terms are in your propositions 3 to 7.

>> 3. This improvement is not limited by computing power, or at least not limited enough by the computing resources and energy available to the substrate of the machine. This system will have a goal that it will optimize for, and that it will not deviate from under any circumstances regardless of how intelligent it is.

>> 4. If the system was designed to maximize the number of marbles in the universe, the fact that it’s making itself recursively more intelligent won’t cause it to ever deviate from this simple goal.

>> 5. This needs to happen so fast that we cannot turn it off (also known as the Foom scenario).

>> 6. The machine WILL decide that humans are an obstacle towards this maximization goal (either because we are made of matter that it can use, or because we might somehow stop it). Thus, it MUST eliminate humanity (or at least neutralize it).

>> 7. It’s possible for this machine to do the required scientific research and build the mechanisms to eliminate humanity before we can defend ourselves and before we can stop it.