| > Like there is actually nothing I could say, no evidence I could show to ever convince someone out of that position. There is, it is just very hard to obtain. Various formal proofs would do. On upper bounds. On controllability. On scalability of safety techniques. The manhattan project scientists did check whether they'd ignite the atmosphere before detonating their first prototype. Yes, that was much simpler task. But there's no rule in nature that says proving a system to be safe must be as easy as creating the system. Especially when the concern is that the system adaptive and adversarial. Recursive self-improvement is a positive feedback loop, like nuclear chain reactions, like virus replication. So if we have an AI that can program then we better make sure that it either cannot sustain such a positive feedback loop or that it remains controllable beyond criticality.
Given the complexity of the task it appears unlikely that a simple ten-page paper proving this will show up on arxiv. But if one did that'd be great. >> You won't necessarily get the escalating harm even if AI doom is real though > Yes we would. So what does guarantee a visible catastrophe that won't be attributed to human operators using a non-agentic AI incorrectly? We keep scaling and the systems will be treated as assistants/optimizers and it's always the operators fault. Until we roughly reach human-level on some relevant metrics. And at that point there's a very narrow complexity range from idiot to genius (human brains don't vary by orders of magnitude!). So as far as hardware goes this could be a very narrow range and we could shoot straight from "non-agentic sub-human AI" to "agentic superintelligence" in short timescales once the hardware has that latent capacity.
And up until that point it will always have been a human error, lax corporate policies, insufficient filtering of the training set or whatever. And it's not that it must happen this way. Just that there doesn't seem anything ruling it and similar pathways out. |