Hacker News new | ask | show | jobs
by FeepingCreature 797 days ago
Well, at least if you see escalating measurable harm you'll come around, I'm happy about that. You won't necessarily get the escalating harm even if AI doom is real though, so you should try to discover if it is real even in worlds where hard takeoff is a thing.

> What I would expect is for the people who claim to care about AI doom to actually be trying to measure real world harm.

Why bother? If escalating harm is a thing, everyone will notice. We don't need to bolster that, because ordinary society has it handled.

1 comments

> You won't necessarily get the escalating harm even if AI doom is real though

Yes we would. Unless you are one of those people who think that the magic doom nanobots are going to be invented overnight.

My comparisions to someone who is worried about literal magic, from harry potter, is apt.

But at that point, if you are worried about magic showing up instantly, then your position is basically not falsifiable. You can always retreat to some untestable, unfalsifiable magic.

Like there is actually nothing I could say, no evidence I could show to ever convince someone out of that position.

On the other hand, my position is actually fasifiable. There is absolutely all sorts of non world ending evidence that could convince me to think that AI is dangerous.

But nobody on the doomer side seems to care about any of that. Instead they invent positions that seem almost tailor made to avoid being falsifiable or disprovable so that they can continue to believe them despite any evidence to the contrary.

As in, if I were to purposeful invent an idea or philosophy that is impossible to be disproved or convinced out of the "I can't show you evidence because the world will end" position is what I would invent.

> you'll come around,

Do you admit that you won't though? Do you admit that no matter what evidence is shown to you, that you can just retreat and say that the magic could happen at any time?

Or even if this isn't you literally, that someone in your position could dismiss all counter evidence, no matter what, and nobody could convince someone out of that with evidence?

I am not sure how someone could ever possibly engage with you seriously on any of this, if that is your position.

> Like there is actually nothing I could say, no evidence I could show to ever convince someone out of that position.

There is, it is just very hard to obtain. Various formal proofs would do. On upper bounds. On controllability. On scalability of safety techniques.

The manhattan project scientists did check whether they'd ignite the atmosphere before detonating their first prototype. Yes, that was much simpler task. But there's no rule in nature that says proving a system to be safe must be as easy as creating the system. Especially when the concern is that the system adaptive and adversarial.

Recursive self-improvement is a positive feedback loop, like nuclear chain reactions, like virus replication. So if we have an AI that can program then we better make sure that it either cannot sustain such a positive feedback loop or that it remains controllable beyond criticality. Given the complexity of the task it appears unlikely that a simple ten-page paper proving this will show up on arxiv. But if one did that'd be great.

>> You won't necessarily get the escalating harm even if AI doom is real though

> Yes we would.

So what does guarantee a visible catastrophe that won't be attributed to human operators using a non-agentic AI incorrectly? We keep scaling and the systems will be treated as assistants/optimizers and it's always the operators fault. Until we roughly reach human-level on some relevant metrics. And at that point there's a very narrow complexity range from idiot to genius (human brains don't vary by orders of magnitude!). So as far as hardware goes this could be a very narrow range and we could shoot straight from "non-agentic sub-human AI" to "agentic superintelligence" in short timescales once the hardware has that latent capacity. And up until that point it will always have been a human error, lax corporate policies, insufficient filtering of the training set or whatever.

And it's not that it must happen this way. Just that there doesn't seem anything ruling it and similar pathways out.