Hacker News new | ask | show | jobs
by paxys 994 days ago
The AI isn't creating a new recipe on its own. If a language model spits something out it was already available and indexable on the internet, and you could already search for it. Having a different interface for it doesn't change much.
5 comments

> "If a language model spits something out it was already available and indexable on the internet"

This is false in several aspects. Not only are some models training on materials that are either not on the internet, or not easy to find (especially given Google's decline in finding advanced topics), but they also show abilities to synthesize related materials into more useful (or at least compact) forms.

In particular, consider there may exist topics where there is enough public info (including deep in off-internet or off-search-engine sources) that a person with a 160 IQ (+4SD, ~0.0032% of population) could devise their own usable recipes for interesting or dangerous effects. Those ~250K people worldwide are, we might hope & generally expect, fairly well-integrated into useful teams/projects that interest them, with occasional exceptions.

Now, imagine another 4 billion people get a 160 IQ assistant who can't say no to whatever they request, able to assemble & summarize-into-usable form all that "public" info in seconds compared to the months it'd take even a smart human or team of smart humans.

That would create new opportunities & risks, via the "different interface", that didn't exist before and do in fact "change much".

We are not anywhere near 160 IQ assistants, otherwise there'd have been a blooming of incredible 1-person projects by now.

By 160 IQ, there should have been people researching ultra-safe languages with novel reflection types enhanced by brilliant thermodynamics inspired SMT solvers. More contributors to TLA+ and TCS, number theoretic advancements and tools like TLA+ and reflection types would be better integrated into everyday software development.

There would be deeper, cleverer searches across possible reagents and combinations of them to add to watch lists, expanding and improving on already existing systems.

Sure, a world where the average IQ abruptly shifts upwards would mean a bump in brilliant offenders but it also results in a far larger bump in genius level defenders.

I agree we're not at 160 IQ general-assitants, yet.

But just a few years ago, I'd have said that prospect was "maybe 20 years away, or longer, or even never". Today, with the recent rapid progress with LLMs (& other related models), with many tens-of-billions of new investment, & plentiful gains seemingly possible from just "scaling up" (to say nothing of concommitant rapid theoretical improvements), I'd strongly disagree with "not anywhere near". It might be just a year or few away, especially in well-resourced labs that aren't sharing their best work publically.

So yes, all those things you'd expect with plentiful fast-thinking 160 IQ assistants are things that I expect, too. And there's a non-negligible chance those start breaking out all over in the next few years.

And yes, such advances would upgrade prudent & good-intentioned "defenders", too. But are all the domains-of-danger symmetrical in the effects of upgraded attackers and defenders? For example, if you think "watch lists" of dangerous inputs are an effective defense – I'm not sure they are – can you generate & enforce those new "watch lists" faster than completely-untracked capacities & novel syntheses are developed? (Does your red-teaming to enumerate risks actually create new leaked recipes-for-mayhem?)

That's unclear, so even though in general I am optimistic about AI, & wary of any centralized-authority "pause" interventions proposed so far, I take well-informed analysis of risks seriously.

And I think casually & confidently judging these AIs as being categorically incapable of synthesizing novel recipes-for-harm, or being certain that amoral genius-level AI assistants are so far away as to be beyond-a-horizon-of-concern, are reflective of gaps in understanding current AI progress, its velocity, and even its potential acceleration.

I think this argument doesn't work if the model is open source though.

First, it's unclear how all these defensive measures are supposed to help if a bad actor is using an LLM for evil on their personal machine. How do reflection types or watch lists help in that scenario?

Second, if the model is open source, a bad actor could use it for evil before good actors are able to devise, implement, and stress-test all the defensive measures you describe.

Of course it changes much. AIs can synthesize information in increasingly non-trivial ways.

In particular:

> If a language model spits something out it was already available and indexable on the internet,

Is patently false.

Can you provide some examples where LM creates something novel, which is not just a rehash or combination of existing things?

Especially considering how hard it is for humans to create something new, e.g in literature - basically all stories have been written and new ones just copy the existing ones in one way or another.

What kind of novel thing would convince you, given that you're also dismissing most human creation as mere remixes/rehashes?

Attempts to objectively rate LLM creativity are finding leading systems more creative than average humans: https://www.nature.com/articles/s41598-023-40858-3

Have you tried leading models – say, GPT4 for text or code generation, Midjourney for images?

For any example we give you will just say "that's not novel, it's just a mix of existing ideas".
Is patently true.
Not sure what you mean by "recipe" but it can create new output that doesn't exist on the internet. A lot of the output is going to be nonsense, especially stuff that cannot be verified just by looking at it. But it's not accurate to describe it as just a search engine.
>A lot of the output is going to be nonsense, especially stuff that cannot be verified just by looking at it.

Isn't that exactly the point, and why there should be a 'warning/awareness' that it is not a 160 IQ AI but a very good markov chain that can sometimes infer things and other time hallucinate/put random words in a very well articulated way (echo of Sokal maybe)

My random number generator can create new output that has never been seen before on the internet, but that is meaningless to the conversation. Can an LLM derive, from scratch, the steps to create a working nuclear bomb, given nothing more than a basic physics textbook? Until (if ever) AI gets to that stage, all such concerns of danger are premature.
> Can an LLM derive, from scratch, the steps to create a working nuclear bomb, given nothing more than a basic physics textbook?

Of course not. Nobody in the world could do that. But that doesn't mean it can only spit out things that are already available on the internet which is what you originally stated.

And nobody is worried about the risks of ChatGPT giving instructions for building a nuclear bomb. That is obviously not the concern here.

but it does? to take the word recipe literal. there is nothing from for a llm synthesizing a new dish based on knowledge about the ingredients. who knows, it might even taste good (or at least better than what the average Joe cooks)
I was pretty surprised at how good GPT-4 was at creating new recipes at first - I was trying things like "make dish X but for a vegan and someone with gluten intolerance, and give it a spicy twist" - and it produced things that were pretty decent.

Then I realized it's seen literally hundreds of thousands of cooking blogs etc, so it's effectively giving you the "average" version of any recipe you ask for - with your own customizations. And that's actually well within its capabilities to do a decent job of.

And let’s not forget that probably the most common type of comment on a recipe posted on the Internet is people sharing their additions or substitutions. I would bet there is some good ingredient customization data available there.
To take an extreme example, child pornography is available on the internet but society does it's best to make it hard to find.
It's a silly thing to even attack, and that doesn't mean be ok with it, I just mean that shortly, it can be generated on the spot, without ever needing to be transmitted over a network or stored on a hard drive.

And you can't attack the means of generating either, without essentially making open source code and private computers illegal. The code doesn't have to have a single line in it explicity about child porn or designer viruses etc to be used for such things, the same way the cpu or compiler doesn't.

So you would have to have hardware and software that the user does not control which can make judgements about what the user is currently doing, or at least log it.

Did its best. Stable Diffusion is perfectly capable of creating that on accident, even.

I’m actually surprised no politicians have tried to crack down on open-source image generation on that basis yet.

I saw a discussion a few weeks back (not here) where someone was arguing that SD-created images should be legal, as no children would be harmed in their creation, and that it might prevent children from being harmed if permitted.

The strongest counter-argument used was that the existence of such safe images would give cover to those who continue to abuse children to make non-fake images.

Things kind of went to shit when I pointed out that you could include an "audit trail" in the exif data for the images, including seed numbers and other parameters and even the description of the model and training data itself, so that it would be provable that the image was fake. That software could even be written that would automatically test each image, so that those investigating could see immediately that they were provably fake.

I further pointed out that, from a purely legal basis, society could choose to permit only fake images with this intact audit trail, and that the penalties for losing or missing the audit trail could be identical to those for possessing non-fake images.

Unless there is some additional bizarre psychology going on, SD might have the potential to destroy demand for non-fake images, and protect children from harm. There is some evidence that the widespread availability of non-CSAM pornography has led to a reduction in the occurrence of rape since the 1970s.

Society might soon be in a position where it has to decide whether it is more important to protect children or to punish something it finds very icky, when just a few years ago these two goals overlapped nearly perfectly.

> I saw a discussion a few weeks back (not here) where someone was arguing that SD-created images should be legal, as no children would be harmed in their creation, and that it might prevent children from being harmed if permitted.

It's a bit similar to the synthetic Rhino horn strategy intended to curb Rhino poaching[0]. Why risk going to prison or getting shot by a ranger for a 30$ horn? Similarly, why risk prison (and hurt children) to produce or consume CSAM when there is a legal alternative that doesn't harm anyone?

In my view, this approach holds significant merits. But unfortunately, I doubt many politicians would be willing to champion it. They would likely fear having their motives questioned or being unjustly labeled as "pro-pedophile".

[0] https://www.theguardian.com/environment/2019/nov/08/scientis...