Hacker News new | ask | show | jobs
by gojomo 993 days ago
> "If a language model spits something out it was already available and indexable on the internet"

This is false in several aspects. Not only are some models training on materials that are either not on the internet, or not easy to find (especially given Google's decline in finding advanced topics), but they also show abilities to synthesize related materials into more useful (or at least compact) forms.

In particular, consider there may exist topics where there is enough public info (including deep in off-internet or off-search-engine sources) that a person with a 160 IQ (+4SD, ~0.0032% of population) could devise their own usable recipes for interesting or dangerous effects. Those ~250K people worldwide are, we might hope & generally expect, fairly well-integrated into useful teams/projects that interest them, with occasional exceptions.

Now, imagine another 4 billion people get a 160 IQ assistant who can't say no to whatever they request, able to assemble & summarize-into-usable form all that "public" info in seconds compared to the months it'd take even a smart human or team of smart humans.

That would create new opportunities & risks, via the "different interface", that didn't exist before and do in fact "change much".

1 comments

We are not anywhere near 160 IQ assistants, otherwise there'd have been a blooming of incredible 1-person projects by now.

By 160 IQ, there should have been people researching ultra-safe languages with novel reflection types enhanced by brilliant thermodynamics inspired SMT solvers. More contributors to TLA+ and TCS, number theoretic advancements and tools like TLA+ and reflection types would be better integrated into everyday software development.

There would be deeper, cleverer searches across possible reagents and combinations of them to add to watch lists, expanding and improving on already existing systems.

Sure, a world where the average IQ abruptly shifts upwards would mean a bump in brilliant offenders but it also results in a far larger bump in genius level defenders.

I agree we're not at 160 IQ general-assitants, yet.

But just a few years ago, I'd have said that prospect was "maybe 20 years away, or longer, or even never". Today, with the recent rapid progress with LLMs (& other related models), with many tens-of-billions of new investment, & plentiful gains seemingly possible from just "scaling up" (to say nothing of concommitant rapid theoretical improvements), I'd strongly disagree with "not anywhere near". It might be just a year or few away, especially in well-resourced labs that aren't sharing their best work publically.

So yes, all those things you'd expect with plentiful fast-thinking 160 IQ assistants are things that I expect, too. And there's a non-negligible chance those start breaking out all over in the next few years.

And yes, such advances would upgrade prudent & good-intentioned "defenders", too. But are all the domains-of-danger symmetrical in the effects of upgraded attackers and defenders? For example, if you think "watch lists" of dangerous inputs are an effective defense – I'm not sure they are – can you generate & enforce those new "watch lists" faster than completely-untracked capacities & novel syntheses are developed? (Does your red-teaming to enumerate risks actually create new leaked recipes-for-mayhem?)

That's unclear, so even though in general I am optimistic about AI, & wary of any centralized-authority "pause" interventions proposed so far, I take well-informed analysis of risks seriously.

And I think casually & confidently judging these AIs as being categorically incapable of synthesizing novel recipes-for-harm, or being certain that amoral genius-level AI assistants are so far away as to be beyond-a-horizon-of-concern, are reflective of gaps in understanding current AI progress, its velocity, and even its potential acceleration.

I think this argument doesn't work if the model is open source though.

First, it's unclear how all these defensive measures are supposed to help if a bad actor is using an LLM for evil on their personal machine. How do reflection types or watch lists help in that scenario?

Second, if the model is open source, a bad actor could use it for evil before good actors are able to devise, implement, and stress-test all the defensive measures you describe.