|
|
|
|
|
by justonepost2
33 days ago
|
|
This comment seems to commit the same fallacy I’m accusing anthropic of, which is equating alignment as a binary: the good ending, where humans are not extinct, and the bad ending, where they are. The argument, I think, is that an “aligned” AI that doesn’t kill everyone will necessarily lead to an abundant Culture-esque future, and smoothly manage the transition to boot. (Not to mention that 1+ employees of most labs have attended Daniel
Faggella’s pro-extinctionist “Worthy Successor” symposia, but we can put this aside for now) My point is:
1) that this binary is fundamentally insufficient to prescribe good and equitable outcomes for people - if the aligned AI flags overpopulation as a problem and kills a few billion people to improve QoL for the rest, is that good? It doesn’t take much creativity to go from this to the AI simply choosing the mean over the median, and concentrating untold wealth while billions starve or live on subsistence outside their walls. Is that good? And 2) if you come up with a better definition, the parts of it that live inside the model weights cannot be disaggregated from the parts that live outside the model weights. From my perspective (and this article agrees) we have done a pretty excellent job of getting the model weights to work in a way that makes them follow instructions, and a pretty horrible job of suggesting or (gasp) implementing policy that actually creates a decent world in the presence of “aligned” AI. |
|
https://github.com/space-bacon/SRT
This repository empirically proves computational semiotics.