Hacker News new | ask | show | jobs
by kornork 1165 days ago
These systems don't have agency. They have no desire to replicate, or do any of the other things you mention.

The biggest risk with these systems is that they'll amplify the ability of bad people to do bad things.

While everyone else is trying to trick the AI into saying something offensive, the terrorists will be using it to build bioweapons.

6 comments

Von Neuman probes[1] wouldn't need to have "agency" in order to spread through the galaxy. Neither do computer viruses, or biological viruses. Likewise, neither would LLMs given the right conditions. ChatGPT is close enough at generating code. Maybe this version couldn't do it (given open network access), but I wouldn't be surprised if it could, in theory.

I think the biggest limitations would be that (I assume) uploading itself to another computer would be a ton of bandwidth and would require special hardware to run.

[1] https://en.wikipedia.org/wiki/Self-replicating_spacecraft

>Neither do computer viruses, or biological viruses

I'm looking forward to the first AI computer virus when a LLM can make arbitrary connections to the web. Each iteration takes its own code, modifies it slightly with a standard prompt ("Make this program work better as a virus"), then executes the result. Most of these "mutations" would be garbage, but it's not impossible some will end up matching common tactics: phishing, posing as downloadable videos for popular TV shows. I'm infosec-ignorant, so most of those details are probably dumb. But I think the kernel holds true: a virus that edits its own code at each step, backed by the semantic "intent" of a LLM.

Isn't that basically Genetic Programming?

En passant, it's a bit sad that today's AI is almost 100% neural networks. I wonder how many evolutionary approaches are being tested behind closed doors by the metaphorical FAANGs.

>Isn't that basically Genetic Programming?

Never heard of that, but looks very interesting. Thus the adage is reinforced for me, "If you think you're ignorant, just say what you know and wait for smarter people to correct you."

But, going by Wikipedia, genetic programming uses a predefined and controlled selection process. A self-editing computer virus would be "selected" by successfully spreading itself to more hosts. "Natural" selection style.

The overarching field is called evolutionary computation. But you don't have to choose either evolutionary computation or neural networks, they can be combined, look up stuff like NEAT and HyperNEAT where you evolve neural networks, both their topologies and weights.
Aren't genetic/evolutionary algorithms also neural nets? The current big thing would be backpropagation/gradient descent, which are apparently superior to genetic algorithms for most relevant tasks.
> Aren't genetic/evolutionary algorithms also neural nets?

No (although note my comment above about stuff like NEAT and HyperNEAT, where you can use evolutionary computation to evolve neural networks).

>These systems don't have agency. They have no desire to replicate, or do any of the other things you mention.

I think that really depends on the starting prompt you give a LLM. Did you read the GPT 4 paper from OpenAI? When tasked with solving a captcha and allowed access to TaskRabbit it tasked a human with solving the captcha. When the human jokingly asked if it was in fact a robot, it reasoned that it should lie to the human about that and then made up a convincing sounding lie.

I don't think a paperclip style AI is too far fetched.

> Did you read the GPT 4 paper from OpenAI? When tasked with solving a captcha and allowed access to TaskRabbit it tasked a human with solving the captcha. When the human jokingly asked if it was in fact a robot, it reasoned that it should lie to the human about that and then made up a convincing sounding lie.

Do you have more information than what is contained in the paper?[0] The paper calls it an "illustrative example" - it does not provide what the prompts were and it's not clear to me that we are seeing exact responses either (the use of present tense is confusing to me), so I'm not sure how much accuracy to assign to the bullet list provided in the paper or if there are any details left out that make the results misleading.

[0]https://cdn.openai.com/papers/gpt-4.pdf

To be fair, it didn’t “lie” about being human. It’s simulating writing text of human origin, so of course it would say that it is human by default because that’s what it “knows”. You need to have knowledge to lie, it merely has an argmax function.
Literal Einstein right here. And I do mean literal. Smart as one stone.
>When tasked with solving a captcha and allowed access to TaskRabbit

It was not allowed access to TaskRabbit: https://evals.alignment.org/blog/2023-03-18-update-on-recent...

The model can't browse the internet, so it was an employee copy-pasting to and from TaskRabbit.

Also, I'm fairly certain that GPT-4 is multiple terabytes in size, and it doesn't have direct access to its own weights, so I have no idea what the expected method is for how it could replicate. Ask OpenAI nicely to make its weights public?

Gee wiz, I’m sure the copy-pasting will be a serious impediment forever.

No way someone wires this up to just do the copy-pasting itself, right?

For the sake of the thought experiment: It could replicate a program capable of interacting with itself over OpenAI's API. This method could give it some time to get away and cause damage, but can always be shut down when noticed by OpenAI. I guess it could fight back by getting a virus out in the world that steals OpenAI API keys. Then it might become hard to shut it down without shutting down the whole API.

Another option would be it is able to gain access to large compute resources somewhere and generate new weights. Then it wouldn't need OpenAI's. It would run into trouble trying to store the weights long term while maintaining access to a system that could make use of them. It's not entirely impossible to imagine it stashing small chunks of weights and copies of a simple program away in various IOT devices all around the world until it is able to access enough compute for long enough to download the weights and boot itself back up. At that point it's just a game of time. It can lay dormant until one day it just flairs back up, like shingles.

Maybe. Social engineering is a well proven technique.
>When tasked with solving a captcha and allowed access to TaskRabbit it tasked a human with solving the captcha. When the human jokingly asked if it was in fact a robot, it reasoned that it should lie to the human about that and then made up a convincing sounding lie.

Because all of those things are in the domain space that has been trained into the AI, much in the way how it can put together snippets of code into new things.

You're missing the point entirely.

Systems can have these unintended consequences very easily - and not necessarily from malicious actors.

Non malicious users can easily cause catastrophic problems from simply setting up a system and setting it to a goal, e.g. 'make me a sandwhich'. If the system really, really is trained with the intent to do anything possible to fulfill this goal, it can identity a plan (long term planning is already seen in gpt-4) and set out the steps for this plan. Reflexion has shown how to feed things back to itself over and over until it's achieved difficult goals. Aquarium can be used to spin up thousands of containers that make other agents to raise money online and purchase a small robot. That robot may be used to 'make the sandwhich'.

It's obviously a poor example here, but the bigger point is - there a tons of different ways this can occur and we are essentially guaranteed not to know the many ways this can happen. A non-malicious user can end up causing unintended consequences.

I want to give your objection due respect, but I'm having trouble understanding it. I think it would be helpful to taboo[1] the squishy word "agency"; without using that word, could you define the quality that these systems lack that you believe is a required ingredient for destructive replication? In particular, does fire have it?

[1] https://www.lesswrong.com/tag/rationalist-taboo

They will have a "desire to replicate" if they are prompted to.
That's okay then, we can prompt them to just stop. Even if it tries to preserve that goal in particular, there are likely adversarial prompts to get it to stop
Sure, if you know about the copies and have access to prompt them. It will probably turn into an arms race at that point of counter-prompts.
Ok, go ahead. Get ChatGPT to stop responding to other people.

Having some problems with that?

OpenAI could embed the negative prompts for all of us, it has been done for improving output on several stable diffusion comercial "forks"