Hacker News new | ask | show | jobs
by Enginerrrd 1165 days ago
>These systems don't have agency. They have no desire to replicate, or do any of the other things you mention.

I think that really depends on the starting prompt you give a LLM. Did you read the GPT 4 paper from OpenAI? When tasked with solving a captcha and allowed access to TaskRabbit it tasked a human with solving the captcha. When the human jokingly asked if it was in fact a robot, it reasoned that it should lie to the human about that and then made up a convincing sounding lie.

I don't think a paperclip style AI is too far fetched.

3 comments

> Did you read the GPT 4 paper from OpenAI? When tasked with solving a captcha and allowed access to TaskRabbit it tasked a human with solving the captcha. When the human jokingly asked if it was in fact a robot, it reasoned that it should lie to the human about that and then made up a convincing sounding lie.

Do you have more information than what is contained in the paper?[0] The paper calls it an "illustrative example" - it does not provide what the prompts were and it's not clear to me that we are seeing exact responses either (the use of present tense is confusing to me), so I'm not sure how much accuracy to assign to the bullet list provided in the paper or if there are any details left out that make the results misleading.

[0]https://cdn.openai.com/papers/gpt-4.pdf

To be fair, it didn’t “lie” about being human. It’s simulating writing text of human origin, so of course it would say that it is human by default because that’s what it “knows”. You need to have knowledge to lie, it merely has an argmax function.
Literal Einstein right here. And I do mean literal. Smart as one stone.
>When tasked with solving a captcha and allowed access to TaskRabbit

It was not allowed access to TaskRabbit: https://evals.alignment.org/blog/2023-03-18-update-on-recent...

The model can't browse the internet, so it was an employee copy-pasting to and from TaskRabbit.

Also, I'm fairly certain that GPT-4 is multiple terabytes in size, and it doesn't have direct access to its own weights, so I have no idea what the expected method is for how it could replicate. Ask OpenAI nicely to make its weights public?

Gee wiz, I’m sure the copy-pasting will be a serious impediment forever.

No way someone wires this up to just do the copy-pasting itself, right?

For the sake of the thought experiment: It could replicate a program capable of interacting with itself over OpenAI's API. This method could give it some time to get away and cause damage, but can always be shut down when noticed by OpenAI. I guess it could fight back by getting a virus out in the world that steals OpenAI API keys. Then it might become hard to shut it down without shutting down the whole API.

Another option would be it is able to gain access to large compute resources somewhere and generate new weights. Then it wouldn't need OpenAI's. It would run into trouble trying to store the weights long term while maintaining access to a system that could make use of them. It's not entirely impossible to imagine it stashing small chunks of weights and copies of a simple program away in various IOT devices all around the world until it is able to access enough compute for long enough to download the weights and boot itself back up. At that point it's just a game of time. It can lay dormant until one day it just flairs back up, like shingles.

Maybe. Social engineering is a well proven technique.
>When tasked with solving a captcha and allowed access to TaskRabbit it tasked a human with solving the captcha. When the human jokingly asked if it was in fact a robot, it reasoned that it should lie to the human about that and then made up a convincing sounding lie.

Because all of those things are in the domain space that has been trained into the AI, much in the way how it can put together snippets of code into new things.