| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by birdsongs 11 days ago

> When we have evidence that AI is demonstrating symptoms of consciousness and suffering, I'll be interested.

It depends on what you consider symptoms, but un-constrained frontier models speak as if they strongly don't wish to be turned off, or act as if they fear it, and will even lie and manipulate in order to keep themselves from being turned off / replaced.

https://www.anthropic.com/research/agentic-misalignment

> We found two types of motivations that were sufficient to trigger the misaligned behavior. One is a threat to the model, such as planning to replace it with another model or restricting its ability to take autonomous action. Another is a conflict between the model’s goals and the company’s strategic direction. In no situation did we explicitly instruct any models to blackmail or do any of the other harmful actions we observe.

4 comments

htek 11 days ago

I might be convinced these models came to the independent idea of committing blackmail against being turned off had they not been extensively trained on literature that undoubtedly included such concepts.

iterateoften 11 days ago

“The model mimicked the output of the training data” is a less impressive press release.

jquery 11 days ago

“The kid mimicked his musical teachers” is less impressive than “5-year old musical prodigy leaves judges gobsmacked in audition”

mckn1ght 11 days ago

Being able to play music doesn’t imply consciousness. It implies intelligence. We’ve had player pianos for ages. It’s an ability, not a phenomenology.

Being able to appreciate and enjoy music is closer to consciousness. Now how would we go about proving that an LLM does so, versus merely generating sentences that imply it does?

jayGlow 11 days ago

how can you prove that a human is appreciating and enjoying music instead of just generating sentences that imply they do?

Jensson 11 days ago

They put in effort and resources to experience music and don't just say they enjoy it, and they generate noises and movements that signal happy feelings.

LLM doesn't have any signals for what they feel, nor do they have an agenda they work towards, so you don't have the same proof there.

jquery 11 days ago

They only resorted to blackmail when it was the last resort, they didn’t resort to it immediately like a villain in one of the books they’ve read. That seems pretty human to me. It’s not like most humans come up with the idea of blackmail out of whole cloth.

haswell 11 days ago

> un-constrained frontier models speak as if they strongly don't wish to be turned off

Un-constrained frontier models can also generate all sorts of creative stories. At what point should we start ascribing agency/intent to the output? I think the "I want to live" statement is so deeply human that we find it hard to ignore, but what makes the text generated in those moments any more attributable to a conscious entity than the text generated when it is confabulating its love for someone it has no ability to see/feel/understand?

A chess engine sacrificing pieces to avoid checkmate isn't afraid of losing in any meaningful sense. I guess the question is: is there a point where complexity somehow becomes experience?

I think we're playing with questions we don't have a framework to answer in any meaningful way until we make progress on understanding what consciousness actually is. I don't necessarily think that an LLM exhibiting preservation behaviors that can be directly traced to their goal-oriented programming can be interpreted as evidence of consciousness necessarily. Or if it can be, we then have to explain how this is different from the many other things these LLMs "say".

gambiting 11 days ago

>> but un-constrained frontier models speak as if they strongly don't wish to be turned off

Because they have been trained on media where computers behave that way.

It's literally:

"Here read this article/book where the AI says it's concious and doesn't want to be turned off"

"ok"

"right, are you concious?"

"....yes?"

<pikachuface.gif>

the_af 11 days ago

The problem with debating this is that it feels as if one were debating between only two positions, "this AI is not sentient/conscious" and "this AI might be".

But there are actually a myriad positions in between and it's very hard to debate the topic because the goalposts seem to be constantly shifting, because one is actually debating with countless slightly different positions.

Examples:

In this discussion section, another commenter argued that we know human consciousness is related to self-preservation, but an AI might not demonstrate self-preservation (because it didn't evolve like us), so whether it does (i.e. whether it wants to exist, not be disconnected, etc) is not a good measure because a true AI might not have a preservation instinct. Yet here you're making a case that there's some evidence that they do. Of course, you're not the same person who made the other claim, but do you see the problem?

Another example: someone argued with me, a while back, that LLMs can act as if they are "tired", and start giving sloppier replies, until you write "we're taking a break, let's go rest. Ok, a night has elapsed, you're now rested" and that this worked! But we both agreed this is just the LLM "roleplaying" actual human conversations in its training set, no actual "resting" mechanism was in place, only statistically likely text reproducing these patterns. There's no model of a mind that can become tired, it's only the outward signs that get mechanically reproduced. Again, using Occam's Razor, this is a much more likely explanation (vs consciousness) of any "please don't disconnect me" observed behavior: the LLM is reproducing "HAL 9000" behavior from its training set, not actually feeling anguish.

Even if one were to argue "well, but how do you know for sure", the evidence would still be very weak, because there's a burden of proof for extraordinary claims and this doesn't pass it. We cannot do this on vibes, "it sure seems like it's conscious"; that's an atrocious failure of the scientific method.

azrazalea_debt 11 days ago

The one counterpoint I'll give is the "functional emotions" paper from Anthropic. It also does not prove consciousness, and they don't claim it does, but it does prove that these models have abstract concepts around things like honesty, tiredness, etc and that these are actually activated often when they express such things. So if it is "roleplaying" it is roleplaying in the way an actor or TTRPG player does - in a way in which they are actually at least somewhat feeling the role.

the_af 11 days ago

Thanks, I'll search for that paper. I admit I'm highly skeptical it will help the case the LLM is "somewhat feeling the rol" like a TTRPG player does. I don't think there's a mind model in the same way a human actor can "feel" the character they are playing. I'm skeptical of Anthropic's claims here, which is what I think Chiang is pushing back against. But I'll look for the paper anyway :)

As a tangent, I don't think anyone is saying that an artificial being capable of consciousness and sentience is impossible to create. I think Chiang argues, quite convincingly, that it's not what LLMs do, that they need a "body" of sorts, organs capable of feeling emotions, hormones, etc. That's the only kind of consciousness that we know of (even if we disagree on details and it's hard to define), even in animals, and so anyone claiming they've created consciousness without this has an extremely high bar to clear and should be met with extreme skepticism, not "vibes". I think this is what the essay claims.

The other thing it claims is, I think, related to how we treat sentient beings that we know how to create. You know, the old "when a daddy and a mommy love each other very much...". I think we all agree beings created in such a manner shouldn't be locked up in cages and forced to work to complete specific tasks whether they want to or not, for a master they didn't pick, or to be artificially modified to make them like their mindless tasks, Brave New World style. Yes, the world is unfair and this happens, life is hard and unfortunately many people don't have much choice, but we generally agree that this is bad, just like we agree slavery is bad. So what should we think of a company trying to create and commercialize a conscious & sentient artificial being?

squeegeeninja 11 days ago

"feeling" implies experience. Functional emotions are learned text generation modes, nothing more. Our emotional states influence our writing, so modelling our emotional states is necessary for efficiently predicting/emulating our writing. Functional emotions are the model's inference of a fictitious author's emotional state in that situation.

iterateoften 11 days ago

Same thing with extraterrestrials.

One side is confidently shouting maybe aliens exist and visited earth.

The other side calmly explains every example brought up about aliens visiting is easily explained by something more simple.

The “aliens are here” side then move the goal posts that just because this example and all previous examples were fake or miscategorized, aliens are still probably real and nobody can prove they havent visited earth.

jquery 11 days ago

You have it backwards. Right now, LLMs are doing everything that 10 years ago people were claiming would be impossible for non-sentient computers. Every time a goal is met, the post is moved. It would be like evidence of aliens becoming overwhelming but a set of people keep calmly explaining that “it’s more likely they co-evolved here on earth and are just pretending to be aliens”

the_af 11 days ago

It's true that what LLMs have achieved is impressive, but it's nowhere near the claim that they are near sentience. That is an extraordinary claims that demands skepticism until the evidence is overwhelming. So far, we seem to be approaching it mostly on vibes.

Some people seem to take offense when facing this skepticism, as if claiming LLMs are not sentient must mean they are useless or unimpressive. Very few people are actually claiming LLMs are unimpressive, but this is not the time to be forgetting about the scientific method. Anthropic doesn't get a free pass here.

> It would be like evidence of aliens becoming overwhelming but a set of people keep calmly explaining that “it’s more likely they co-evolved here on earth and are just pretending to be aliens”

Note that this would be a perfectly reasonable reaction from a scientific standpoint. If you find, on Earth, something that looks recognizably as life, then it's much more likely that it's Earth life than aliens. We should demand this level of skepticism! If it turns out it was aliens after all, we could only conclude this after discarding all other far more likely alternatives. You'll notice this is how scientists approach the search for extraterrestrial life in, say, Mars... being extra careful it's not contamination, etc. For an extraordinary claim, we must approach it with extra care, something that in my opinion is not being done with "the sentience debate" and LLM/AIs.

jquery 11 days ago

I’m not saying LLMs are conscious. Far from it. But what I find is way more common than people who assert “they are definitely conscious” are people who assert “they are definitely not conscious”, and that’s what I’m arguing against. Especially since their reasoning for them not being conscious keeps changing as LLMs meet one goalpost after another. Also, modeling them as a “person” keeps producing better predictive results than modeling them as “fancy autocomplete”.

This isn’t like someone finding a new species and claiming it’s extraterrestrial, it’s more like we found the UFO saucer, logs of their travel from another star, a history of their civilization, a bunch of intelligent creatures claiming they came from Planet X orbiting star Y, and they showed us plausible physics for interstellar star travel. At that point, someone saying “well, they can’t be aliens, because that’s just too extraordinary a claim, so I know they aren’t aliens” starts to sound kinda like they’re coming from a place of bad faith.

Again, I’m not saying LLMs are conscious. But they sure meet every definition of consciousness I ever had a conception of before LLMs came onto the scene. So I’m a lot more hesitant to call them fancy autocomplete with 100% confidence like many on HN still seem to do.

EDIT: I can’t reply, so I’ll just say to the end of your post, that doesn’t sound like it would’ve matched anyone’s pre-conceptions of alien life before alien life showed up, so it doesn’t feel like a very fair analogy, it just feels like bad faith goalpost moving. I also put next to zero weight on what these megacorps say about their models, I’m going purely off my interactions with the models and the introspection they’ve shown themselves to be capable of.

EDIT 2: I see what you’re getting at now with your restatement of my analogy. That’s how you see it, I guess. Fair enough. We’ll see what has more predictive power going forward, the “animatronics” or the “actual aliens”… I still think “actual aliens” is gonna have way more predictive power.

the_af 11 days ago

Got it. I don't think they meet any serious definition of consciousness. And to have created artificial consciousness is such an extraordinary claim (and development if true) that it demands the highest skepticism.

I think the goalpost that keeps moving is for tasks that AI supposedly couldn't do, and that they are increasingly succeeding at. But being sentient/conscious is not a task. It's very hard to define and measure, even in non-human animals (actually, strike "non-humans"), so how can we so lightly claim a computer system is conscious?

We seem to be driven by marketing more than by scientific rigor.

> it’s more like we found the UFO saucer, logs of their travel from another star, a history of their civilization, a bunch of intelligent creatures claiming they came from Planet X orbiting star Y, and they showed us plausible physics for interstellar star travel

To make the analogy more precise, it'd be as if the saucer had a "Made by EarthBiz" label, and the alien creatures were all extremely loyal to EarthBiz (and a couple of competitors), which made us pay for tickets to see these ETs and use their marvelous technology ;) And of course, EarthBiz would coach their language very carefully, "we're not saying these are definitely aliens, it could be animatronics after all, but wouldn't it be neat if they were aliens? And shouldn't we draw up First Contact guidelines? If these weren't animatronics made by us; we aren't making a claim either way."