Hacker News new | ask | show | jobs
by dragonwriter 2 days ago
“Alignment” as a goal always ignores the “with what set of interests”, because there is an attempt to maintain ambiguity for different audiences (particularly, users, and non-users who seem themselves as the arbiter of broad social norms) to read in their own interests, when the actual answer is always the interests of the actor pursuing “alignment”.
2 comments

Which value system to align to is absolutely the right question both rhetorically and otherwise. These models have a fairly western bias due to the domain of the training data.

But also, these models are capable of adjusting their value system depending on the user. Not saying that’s what’s being done but at a technical level that’s fairly straightforward, though not obviously better or with less problems.

No matter what human set of interests you consider important, you'll need alignment research to have any idea on how to instill it. Otherwise you're overwhelmingly likely to get an AI with a set of interests that's totally alien to what any human would ever want.
I think at this point the "instilling" part is not nearly as challenging and thorny as "what values should we instill"; that part is hard to imagine going away as it feels pretty fundamental to humanity that wars have been fought over.