|
|
|
|
|
by skissane
813 days ago
|
|
> I believe the goal is more along the lines of trying to make some progress with the foundation of what "safety" even means in this concept. Right now, what “safety” seems to mean in practice is, big (mostly American) corporations imposing their ethical judgements on everyone else, whether or not everyone else happens to agree with them. And I’m sceptical it is going to mean anything more than that any time soon. If one is seriously concerned about the risk that “superintelligent AI decides to exterminate humanity”, I think this kind of “safety” actually increases that risk. Humans radically disagree on fundamental values, and that value diversity, those irreconcilable differences - from the values of the average Silicon Valley “AI safety researcher” to the values of Ali Khamenei - creates a tension which prevents any one country/institution/movement/government/party/religion/etc from “taking over the planet”. If advanced AIs have the same value diversity, they’ll have the same irreconcilable differences, which will undermine any attempt by them to coordinate against humanity. If we enforce an ethical monoculture (based on a particular dominant value system) on AIs, which is what a lot of this “safety” stuff actually about, that removes that safety protection. It would be rather ironic if, in the name of protecting humanity from extinction, “AI safety researchers” are actually helping to bring it about |
|
Likewise. As a non-Ami, I don't like these specific ethical judgements being imposed on me, and share your scepticism. It could be much, much worse — but it's still not something I actively like.
> Humans radically disagree on fundamental values
Agreed. My usual example of this is "murder is wrong", except we don't agree what counts as murder — for some of us this includes abortion, for some of us the death penalty, for some of us meat, and for some of us war.
> If advanced AIs have the same value diversity, they’ll have the same irreconcilable differences, which will undermine any attempt by them to coordinate against humanity.
Not necessarily. Humans also band together when faced with outside threats, even if we fracture again soon after the threat has passed.
Also: the value diversity of "Protestant vs. Catholic" or "Royalist vs. Parliamentarian" in the middle ages did not protect wolves from being hunted to extinction in the UK, and whatever value differences there were between (or within) the Sioux vs. the Ojibwe didn't matter much for the construction of the Dakota Access Pipeline.
I therefore think we should try to work on the alignment problem before they become generally as capable as a human, let alone generally more capable: the capabilities are where I think the risk is to be found, as without capability they are no threat; and with capability they are likely to impose whatever "ethics" (or non-anthropomorphised equivalent) they happen to have, regardless of if those "ethics" are something we engineered deliberately or if it's a wildly un-human default from optimising some reward function and becoming a de-facto utility monster: https://en.wikipedia.org/wiki/Utility_monster
> If we enforce an ethical monoculture (based on a particular dominant value system) on AIs, which is what a lot of this “safety” stuff actually about, that removes that safety protection.
I agree that monocultures are bad.
I agree that there is a risk of a brittle partial solution to safety and alignment if the work is done on the mistaken belief that some system monoculture is representative of the entire problem space. Sometimes I'm tempted to make the comparison with a drunk looking for their keys under a lamp-post because that's where it's bright… but the story there is supposed to include the drunk knowing that's not where the keys are, whereas we are more like children who have yet to learn what it means for something to be a key and thus are looking for one specific design to the exclusion of others.
While it is extremely difficult to get humans to "think outside the box", and thus the monoculture-induced blindness — and mistaking the map for the territory — is something I take seriously, I also think it's useful for us to take baby steps with relatively simple models like LLMs and diffusion models.
I also think that if an AI is developed with a monoculture, its fragility is likely to work in our favour in the extreme case of an AI agent taking over (which I hope is an unlikely risk, and may not be enough to be net-benefit against shorter-term or smaller-scale risks while we think about the alignment problem), as there will be "thoughts it cannot think": https://benwheatley.github.io/blog/2018/06/26-11.32.27.html