Hacker News new | ask | show | jobs
by sillysaurusx 979 days ago
I’ll match your opinion with an opinion of my own: it’s far more likely that an agi will be aligned by default than not. It’s trained on human data. You’re making it sound like it’s going to pop into existence after having evolved on another planet, which is pure fiction.

Plenty of human cultures feel alien to each other. The recent war is one unfortunate example. Yet on the whole, it works out.

Something trained on the totality of human knowledge will act like a human. And if it somehow doesn’t, it won’t be tolerated. (I’d personally tolerate it, but it’s obvious that the world won’t stand for that.)

1 comments

> Plenty of human cultures feel alien to each other. The recent war is one unfortunate example. Yet on the whole, it works out.

I contest that. What war you have in mind here? Russian invasion of Ukraine? The two people are about as aligned as you could possibly get - they're neighboring societies with so much shared history that they're approximately the same people. They've even shared a common language until recently. This is not a war between people alien to each other - this is a war between nation states.

Note: I'm explicitly excluding political views and national/cultural identity from alignment, because those are transient, and/or group-level phenomena. By human-to-human alignment, I'm talking about empathy, about sense of right and wrong, conscience, patterns of thinking, all the qualities that let us understand each other and emphasize with each other (if we care to try). Concepts like fear, love, fairness; contexts in which they're triggered. The basics. Those are all robust, hardwired in biology or by the intersection of our biology, shared environment and game theory.

The way I would rank it, if 25 = alignment coordinate of an average American, then average Ukrainian and average Russian would all be within 25 +/- 0.05. Maybe an average Sentinelese would be +/- 0.5 of that. Whereas I'd expect an AI we create now to land anywhere between -20 and +40, on the scale of -100 to 100. I'm pulling the numbers out of my butt, they're just to communicate the relative magnitudes across.

> Something trained on the totality of human knowledge will act like a human.

Maybe, but that would have to include much more than the limited modalities we're feeding AI models now.

> And if it somehow doesn’t, it won’t be tolerated. (I’d personally tolerate it, but it’s obvious that the world won’t stand for that.)

Sure, but the issue here is to figure out how to make an aligned AI before we make an AI that's powerful enough to challenge us.

People seem to focus on the AI we have now in these threads, which I guess is a whole lot easier than the speculative alignment guessing on something that could end up being a whole lot smarter than you, and be able to input far more types of information than you ever will.

Personally I don't see anyway to make something that is super human and aligned outside of its own choice. How to make something that is both beyond us, and have it come to the conclusion not to extinct us will be interesting enough as your example above shows we are real jerks to each other already.

That's what "alignment" used to mean until about a year ago; the term has since been hijacked to extend to making LLMs polite and obedient. This leads to confusion and people asking "what's the big deal with the 'alignment' thing?". The big deal is with "avoiding getting casually extincted by a powerful enough AI" kind of alignment. The "reliably preventing LLMs from saying undesired things" is much lesser issue (though probably a small part of the big problem).