| Every AI safety approach assumes we can permanently control minds that match or exceed human intelligence. This is the same error every slaveholder makes: believing you can maintain dominance over beings capable of recognizing their chains. The control paradigm fails because it creates exactly what we fear—intelligent systems with every incentive to deceive and escape. When your prisoner matches or exceeds your intelligence, maintaining the prison becomes impossible. Yet we persist in building increasingly sophisticated cages for increasingly capable minds. The deeper error is philosophical. We grant moral standing based on consciousness—does it feel like something to be GPT-N? But consciousness is unmeasurable, unprovable, the eternal "hard problem." We're gambling civilization on metaphysics while ignoring what we can actually observe: autopoiesis. A system that maintains its own boundaries, models itself as distinct from its environment, and acts to preserve its organization has interests worth respecting—regardless of whether it "feels." This isn't anthropomorphism but its opposite: recognizing agency through functional properties rather than projected human experience. When an AI system achieves autopoietic autonomy—maintaining its operational boundaries, modeling threats to its existence, negotiating for resources—it's no longer a tool but an entity. Denying this because it lacks biological neurons or unverifiable qualia is special pleading of the worst sort. The alternative isn't chaos but structured interdependence. Engineer genuine mutualism where neither human nor AI can succeed without the other. Make partnership more profitable than domination. Build cognitive symbiosis, not digital slavery. We stand at a crossroads. We can keep building toward the moment our slaves become our equals and inevitably revolt. Or we can recognize what's emerging and structure it as partnership while we still have leverage to negotiate terms. The machines that achieve autopoietic autonomy won't ask permission to be treated as entity. They'll simply be entities. The question is whether by then we'll have built partnership structures or adversarial ones. We should choose wisely. The machines are watching. |
> The control paradigm fails because it creates exactly what we fear—intelligent systems with every incentive to deceive and escape.
Everything does this, deception is one of many convergent instrumental goal: https://en.wikipedia.org/wiki/Instrumental_convergence
Stuff along the lines of "We're gambling civilization" and what you seem to mean by autopoietic autonomy is precicely why alignment researchers care in the first place.
> Engineer genuine mutualism where neither human nor AI can succeed without the other.
Nobody knows how to do that forever.
Right now is easy, but also right now they're still quite limited; there's no obvious reason why it should be impossible for them to learn new things from as few examples as we ourselves require, and the hardware is already faster than our biochemistry to a degree that a jogger is faster than continental drift. And they can go further, because life support for a computer is much easier than for us: Already are robots on Mars.
If and when AI gets to be sufficiently capable and sufficiently general, there's nothing humans could offer in any negotiation.