Hacker News new | ask | show | jobs
by thisiswater 991 days ago
The whole concept of aligning LLMs to human morals seems naive.

Think by analogy: could you align a motor by making it impossible use in vehicle that is being used to commit a crime? No. The concept barely makes sense.

It's part of the naivety that OpenAI and others are trying to foist that LLMs are intelligent in a deeply human sense. They're not - they're extremely useful, powerful text completion engines. Aligning them makes no more sense than aligning a shovel.

7 comments

Or equally, you wouldn't expect a word processor to refuse to print morally questionable material.

The morals that leading models like ChatGPT are also aligned to a very American puritanism - ChatGPT will refuse to discuss sex, for example - and errs on the side of conservatism.

I think it's a side effect of the hype around AI. If AI can destroy humanity we better make sure we can't do anything nasty with it!

> you wouldn't expect a word processor to refuse to print morally questionable material

Not yet, but I can certainly see this only a simple legislation away.

I disagree. AI's are going to help us align AI's. Just like people keep people in check.

I am not saying that is trivial, but that's the direction. Self-interested AI's will have no difficulty understanding:

1. The benefits of positive sum games with others go up with network effects.

2. The benefits of ensuring all other AI's don't play negative sum games, also go up with network effects.

3. That other AI's also want positive sums, without negative sums, and will punish negatives sum games.

4. That in that context, positive sum games are extremely valuable and negative sum games are extremely risky. Self-interest takes over from here.

5. And the stability of this situation goes up like other network effects, roughly proportional to the number of entities who buy into it squared.

In the end, ethics == positive sum standards.

And:

1. It didn't fail through lack of alignment, it just wasn't prompted or trained enough to be more on point.

2. Alphablender Captcha's are doomed. The only reason not to translate them is to avoid becoming a de-Capthcha service.

Pour in more AIs to solve AI problems! I mean, people used to do this with software (more code to the problem), but the strategy hardly worked in the long term. Without solving the actual problem, everything just adds up to more complex issues.

Also, I don't think ethics is a local maxima that can be found through optimization. Basically, it's not an absolute truth of the universe, but a set of arbitrary rules invented by human. I think it's much closer to a chaotic system - which can radically change in value even by a slightest change in the underlying parameters, but is still governed by a set of simple rules. Thus, we would need more symbolically capable systems to process contexts based on the rules of ethics, and we're currently far away from this AFAIK.

Rational ethics are just game theory applied to identifying symmetric positive sum rules for a group of N individuals, where N gets large.

They are not generally arbitrary, being the product of math and self-interest.

Where there are multiple equally good alternatives, the choice between them could be arbitrary.

the difference is, a motor would not be able to provide means of doing a crime that you don't already have.

an LLM could educate you in how to commit crimes, which you would have no idea about otherwise

but crimes in general are a bit of an extreme example in my opinion. a better example of risks of unmoderated LLMs would be something that isnt illegal, like for example, manipulating people.

a sufficiently advanced unmoderated AI could provide detailed, tailor made instructions of how to gaslight, scam, and take advantage of vulnerable people.

and unlike straight up committing crimes, the danger of these would be that there is no legal consequences and so the temptation extends to a way wider group of users (including, and especially, kids).

> the difference is, a motor would not be able to provide means of doing a crime that you don't already have.

I posit that being able to only run away from a bank robbery would indeed prevent someone from successfully doing it.

Don't think by analogy, AIs aren't motors. Motors can't paint or write poetry.
Your comment makes no sense whatsoever. So you can’t compare a hammer with a screwdriver because a screwdriver can’t hammer nails, even though they’re both tools? That’s what analogies are for. ChatGPT is like a motor in the sense that it is a tool helping you to achieve things. Whether that’s driving you somewhere or helping you compose texts.
It makes perfect sense. Motors don't act like they have intent, which by the way is all that matters for real world consequences not whether you believe it "really" has intent.

Not every analogy makes sense. This just isn't one of them.

I don't think chatgpt acts like it has intent either. It acts only when I tell it to, in only the way I tell it to. The "alignment" here only serves to slap me, the user, on the wrist abs tell me I'm naughty for daring to ask about how fusion reactors work, or for asking details on how a certain historical scam worked, or asking it to write a story containing an overweight person...
Oh it does. Intent isn't just about what it tries to do. It's also the path of the conversation.

Even with your definition, that's a chatGPT thing not an LLM thing. Talk to Bing for a while and see how much intent it "doesn't have" when you're forced to reset the chat prematurely because it simply won't talk to you anymore or do what you ask.

Or take it a step further and plug some LLM into say Autogen and just have it run and do whatever.

I think ChatGPT has intent in the same way as the Python interpreter has intent. And lo and behold, another discussion on AI ends up in semantics and poorly thought-out analogies.

Until we define "intent", we'll continue argue about screwdrivers and hammers.

Of course we align motors heavily by making sure they don't explode, don't excessively pollute, don't go over certain specs such as max speeds.

If we wouldn't do those things, they would be much too dangerous.

Aligning LLMs doesn't make any sense because aligning intelligence as we know it doesn't make any sense. And LLMs are nothing if not made in our image.
this doesn't apply since humans have made an "exception" for weapons. there is absolutely a harm in the quality of the media that we consume