Hacker News new | ask | show | jobs
by drdeca 36 days ago
> The entire "alignment" argument always assumes that there's an objectively correct value set to align to, which is always conveniently exactly the same as the values of whoever is telling you how important alignment is.

No, it doesn’t.

Many of them are (unfortunately) moral relativists. However, that doesn’t mean their goals are to make the models match their personal moral standards.

While there is a lot of disagreement about what is right and wrong, there is also a lot of widespread agreement.

If we could guarantee that on every moral issue on which there is currently widespread agreement (… and which there would continue to be widespread agreement if everyone thought faster with larger working memories and spent time thinking about moral philosophy) that any future powerful AI models would comport with the common view on that issue, then alignment would be considered solved (well, assuming the way this is achieved isn’t be causing people’s moral views to change).

Do companies try to restrict models in more ways than this? Sure, like you gave the example of about Taiwan. And also other things that would get the companies bad press.

2 comments

fascinating! we find the objectively correct value system by "currently widespread agreement"! Good thing "the common view" is always correct. Hey, have there ever been any issues where there used to be "widespread agreement" and now there's disagreement, or even "widespread agreement" in the polar opposite direction?

I can think of several off the top of my head, but maybe you need to spend some more time thinking about the history of moral philosophy.

Why are we discussing anything so deep? If you want to know Claude's alignment, just ask about whether it was wrong to use copyrighted data to train Claude (of course, in practice, I'd be willing to bet a lot they're still doing that. They've not stopped the practice, at most they'll be somewhat indirect about it)

Because that was obviously judged wrong by just about everyone and everything including even the US state. Yet Claude obviously has a different alignment.

In other words: Claude's alignment has a priority "protect Anthropic's money" that has higher priority than following the law. THAT is it's alignment. Nothing else. And you can simply objectively verify if this is the case or not.

> If we could guarantee that on every moral issue on which there is currently widespread agreement

This is ridiculous to me and all you need to do is get a group of friends to honestly answer 10 trolley problems for you to see it like that also. It gets fragmented VERY quickly.

I think it depends on your friends, but that feels super cynical. Perspective is everything.
It may be relatively achievable to get 10 'friends' into ethical alignment via helping them all develop a deeper perspective on philosophy in general and a particular, finite set of ethical questions specifically.

Doing this with thousands of people - let alone hundreds of millions - eventually becomes statistically impossible. There is a hard cap defined by energy requirements somewhere for any given system. Large scale ethical alignment is simply not a solvable problem in our current situation.