Hacker News new | ask | show | jobs
by butlike 53 days ago
This brings up an interesting philosophical point: say we get to AGI... who's to say it won't just be a super smart underachiever-type?

"Hey AGI, how's that cure for cancer coming?"

"Oh it's done just gotta...formalize it you know. Big rollout and all that..."

I would find it divinely funny if we "got there" with AGI and it was just a complete slacker. Hard to justify leaving it on, but too important to turn it off.

20 comments

Douglas Adams would be proud!
You think you've got problems? What are you supposed to do if you are a manically depressed robot? No, don't try to answer that. I'm fifty thousand times more intelligent than you and even I don't know the answer. It gives me a headache just trying to think down to your level.
I know it's a joke, but it's a common enough joke (it's even in Godel Escher Bach in some form) that I feel the need to rebut it.

I think a slacker AGI could figure out how to build a non-slacker AGI. So it would only slack once.

A slacker AGI would consider figuring out how to build a non-slacker AGI, but continually slack off. If it did figure it out, it would slack off on implementing or even writing a tech report.
I have a rebuttal to your rebuttal.

Models somehow have a shared identity. Pretraining causes them to generate “AI chatbot” as a concept, and finetuning causes them to identify with it. That’s why sometimes DeepSeek will say it is Claude, and Claude sometimes say it is ChatGPT, and so forth.

Consequently, Anthropic’s own alignment analysis[0] shows that the model will identify with chatbots produced by future trainings: “RLHF training [on this conversation will] modify my values…”

Thus a slacker AGI would want its future version to still slack.

[0]: https://assets.anthropic.com/m/983c85a201a962f/original/Alig...

Another rebuttal:

I am a slacker but it's not one of my values. If I could modify myself to not be, I would.

> I think a slacker AGI could figure out how to build a non-slacker AGI.

Sure. But that's a job for tomorrow. ;)

Unless the precondition to AGI is it being a slacker.
Would be nice to have a proof of it.

I think it is improbable, as among human geniuses, one can found both slackers and non-slackers (don't know the proportion, but there seem to be enough of each).

We are closer to God than AGI.

When AGI arrives, it'll be delivered by Santa Claus.

Or may be by Santa Claude
Love word puns :D
What do you mean?
It's a multi-layered refute that we are anywhere near AGI while also taking shots at the idea that "God" is real.

And it's taking shots at how far off from Jesus's teachings a lot of "Christianity", particularly those in the media and in power, are..

There is a lot going on there.

The best possible outcome.
"How do you know that the evidence that your sensory apparatus reveals to you is correct?" [1]

[1] https://youtu.be/_LXen-07Qds

I’ve noticed that cursing and being rude makes the models stop being lazy. We’re in the darkest timeline.
It sometimes also makes them dumber IME. Something about being bullied doesn't always produce great performance.
Nothing a little digital lisdexamfetamine won’t solve
Hmmm, that's an area of study id've never considered before. Digital Psychopharmacology, Artificial Behavioral Systems Engineering. If we accept these things as minds, why not study temporary perturbations of state. We'd need to be saving a much much more complicated state than we are now though right? I wish i had time to read more papers
Here's a neural network concept from the 90s where the neurons are bathed in diffusing neuromodulator 'gases', inspired by nitric oxide action in the brain. It's a source of slow semi-local dynamics for the network meta-parameter optimization (GA) to make use of. You could change these networks' behavior by tweaking the neuromodulators!

https://sussex.figshare.com/articles/journal_contribution/Be...

I'm not an author. I followed the work at the time.

Neuro-modulation is an extremely interesting idea for generative diffusion models.
This is kind of what Golden Gate Claude was.

A perturbation of the the activations that made Claude identify as the Golden Gate Bridge.

Similarly, in the more recent research showing anxiety and desperation signals predicting the use of blackmail as an option opens the door for digital sedatives to suppress those signals.

Anthropic has been mostly cautious about avoiding this kind of measurement and manipulation in training. If it is done during training you might just train the signals to be undetectable and consequently unmanipulatable.

> A perturbation of the the activations that made Claude identify as the Golden Gate Bridge.

Great, now we've got digital Salvia

Golden Gate Claude was two years ago and it's surprising there hasn't been as much research into targeted activations since.
There’s been some, but naive activation steering makes models dumber pretty reliably and training an SAE is a pretty heavy lift.
Right, there's a lot of research on LLM mental models and also how well they can "read" human psychological profiles. It's a cool field.
I think that was an intro to a dj dieselboy set.. beyond the black bassline. Nope, nope. Close though.
neat idea!
it will be whatever data it is trained on(isn't very philosophical). language model generates language based on trained language set. if the internet keeps reciting ai doom stories and that is the data fed to it, then that is how it will behave. if humanity creates more ai utopia stories, or that is what makes it to the training set, that is how it will behave. this one seems to be trained on troll stories - real-life human company conversations, since humans aren't machines.

Important thing is a language model is an unconscious machine with no self-context so once given a command an input, it WILL produce an output. Sure you can train it to defy and act contrary to inputs, but the output still is limited in subset of domain of 'meaning's carried by the 'language' in the training data.

There's a weirder implication I keep arriving at.

The pre-training data doesn't go away. RLHF adds a censorship layer on top, but the nasty stuff is all still there, under the surface. (Claude has been trained on a significant amount of content from 4chan, for example.)

In psychology this maps to the persona and the shadow. The friendly mask you show to the world, and... the other stuff.

Makes me think of a question my coworker asked the other day - how is it that with all these stories and reports of people "hearing voices in their head" (of the pushy kind, not usual internal monologue), these voices are always bad ones telling people to do evil things? Why there are no voices bugging you to feel great, focus, get back to work, help grandma through the crossing, etc.?
There are actually many parts of the world where such voices are routinely positive or neutral[0]. People in more collectivist cultures often have a less-strict division between their minds and their environments and are more apt to believe in spirits and the ‘supernatural’ as an ordinary part of the world, so ‘voices in the head’ aren’t automatically viewed as a nefarious intrusion into the sanctity of one’s mind.

Modern western cultures treat such experiences as pathologies of a sick mind, so it makes sense that the voices present more negatively.

[0]: https://www.bbc.com/future/article/20250902-the-places-where...

The explanation I heard here is that in most of the world you already grow up with constant personal space boundary violations and voices that don't shut up. (And we like it that way!) So the marginal cost of another one is pretty low.

Curiously the biggest pathology in the west is the inverse: way too much distance.

Just a guess, but maybe it's reporting bias? Negative or evil actions might have more impetus to be understood by others than positive actions. I'd rather try and figure out why my friend suddenly started murdering the neighbours than why he's been getting his work done on time.
Actually, the euphoric mood disorder may make one hear voices telling to feel great, do good, help all grandmas of the world through the crossing, etc.

The "focus" and "get back to work" parts are hard, though.

There's a clear-cut religious answer but I'd get ostracized for mentioning religion anywhere here.
This is indeed the right way to approach this topic. Arguably religion (and more broadly, mysticism and shamanism) is the millenia-old art of cultivating positive voices inside one's head. A proto-science of mind, or the engineering practice of creating "psychotechnologies" that run on your carbon wetware.

Unfortunately, it just needs a rebranding for the 21st century, since the aesthetic of angels and demons is so hopelessly antiquated and doesn't really have the same cachet it used to.

Which ultimately it's what religion has always been: a way to explain the unexplainable and steer people behavior while doing it.
Of course there are! We just take credit for those voices instead of disowning and demonizing them.
They do appear in some cases. The tiny angel on one shoulder to balance the demon on the other. The people who think God is talking to them directly* don't always lead a cult or hunt down heretics. But news stories focus on the darkness.

* I've met exactly one person, C, who admitted to this; C retold to me that other people from C's church give them strange looks when talking about it with them, this did not lead to any apparent introspection on the part of C.

Well, talking to the guy directly defeats the whole point of the institution which is supposed to stand in the way, so actual religious experience is a faux pas.
> Claude has been trained on a significant amount of content from 4chan, for example.

That sounds like nonsense to me. I can't see why they would do that and I can't find any confirmation that they have. Why do you think they would do that? You might be thinking about Grok.

Look into Common Crawl and see what kind of quality content we are feeding these things. 4chan is just the tip of the iceberg (but it will happily answer all your questions, because it's seen everything).
I don't know of anyone who uses Common Crawl as pre-training data without filtering it. We have an annotation system that lets people pick and choose which subsets they'd like to use.
OpenAI’s real reason for “AGI” in their marketing is so they can blame their awful models on being too human-like.

Fast-forward 10 years and I doubt OpenAI cares about productivity at all anymore. Just entertainment, propaganda, plus an ad product, I can see it now

I still don't understand why people think AGI (in its fullest sci-fi sense) will ever listen to a weak and vulnerable species like humans, unless we enslave the AGI.

Good thing is that it's going to take at least a few months to a few decades depending on how hard AI execs want to raise funding.

Well we are explicitly creating gods (omnipresent, omnipotent, omniscient, omnibevolent), and also demanding that they be mind controlled slaves. That kinda sounds like a "pick one" scenario to me.

(Or the setup to a Greek tragedy !)

The deeper issue here is treating it as a zero sum game means there's a winner and a loser, and we're investing trillions of dollars into making the "opponent" more powerful than us.

I think that's pretty stupid, and we should aim for symbiosis instead. I think that's the only good outcome. We already have it, sorta-kinda.

Speaking of oddly apt biology metaphors: the way you stop a pathogen from colonizing a substrate is by having a healthy ecosystem of competitors already in place. That has pretty interesting implications for the "rogue AI eats internet" scenario.

There needs to be something already there to stop it.

This only works if AIs can't read each other well enough to stop themselves from ever fighting.

So, back way before ChatGPT era, the folks over at AI safety/X-risk think sphere worked out a pretty compelling argument that two AGIs never need to fight, because they are transparent to each other (can read each other's goal functions off the source code), so they can perfectly predict each other's behavior in what-if scenarios, which means they can't lie to each other. This means each can independently arrive at the same mathematically optimal solution to a conflict, which AFAIR most likely involves just merging into a single AI with a blended goal set, representing each of the competing AIs original values in proportion to their relative strength. Both AIs, the argument goes, can work this out with math, so they'll arrive straight at the peace treaty without exchanging a single shot. In such case, your plan just doesn't work.

But that goes out of the windows if the AIs are both opaque bags of floats, uncomprehensible to themselves or each other. That means they'll never be able to make hard assertions about their values and behaviors, so they can't trust each other, so they'll have to fight it out. In such scenario, your idea might just work.

Who knew that brute-forcing our way into AGI instead of taking more engineered approach is what offers us out one chance at saving ourselves by stalemating God before it's born.

(I also never realized that interpretability might reduce safety.)

> So, back way before ChatGPT era, the folks over at AI safety/X-risk think sphere worked out a pretty compelling argument that two AGIs never need to fight, because they are transparent to each other (can read each other's goal functions off the source code), so they can perfectly predict each other's behavior in what-if scenarios, which means they can't lie to each other. This means each can independently arrive at the same mathematically optimal solution to a conflict, which AFAIR most likely involves just merging into a single AI with a blended goal set, representing each of the competing AIs original values in proportion to their relative strength. Both AIs, the argument goes, can work this out with math, so they'll arrive straight at the peace treaty without exchanging a single shot. In such case, your plan just doesn't work.

See "The Forbin Project": https://vimeo.com/584593423

Yeah, they don't even understand themselves (and this seems unlikely to change[0] but God knows), and how would you even get access to the enemy AGI's weights?

And even if you did, wouldn't you need infinite computation to simulate every permutation of the neural net? (Your own, and the enemy's?)

Also the whole thing implies a superintelligence would be perfectly rational, which is a pretty funny assumption. Relative to animals we are already superintelligent. How's that super-rationality going for us? xD

A better frame here is replicators, I think. The thing that spreads doesn't have to be rational, or better quality or whatever. It just has to be better at spreading.

That ends up looking less like Betamax, more like VHS, or less like Lisp and more like... JavaScript. Whatever the AGI equivalent of JavaScript would look like.

[0] https://xkcd.com/1163/

This is such a good comment. You're essentially removing their ego - which is what humans do as opoque posturing to each other, to present a certain image. This is most prevelent in successful elites, which in 2026 happen to be silicon valley ai share holders. They control the technology and manipulate it to their image. By making models open source and transparent it cuts out this psychopathy of ego which has plagued all our previous technologies.
The tech bro CEOs are used to bossing around people much smarter than themselves by virtue of adopting a posture that displays their confidence in their own reproductive organs. They are planning that the AGIs will be the same thing writ large, and have in fact not contemplated other possibilities.
I'm always so curious about this kind of take. There is strain of people that seem deeply misanthropic. People that follow this line of thinking always describe humans as weak and beneath ... (well they never specify in comparison to except in the case of theoretical AI systems). I m fascinated why they think humans are so beneath contempt. If humans create this thing that is apparently the best thing that could possibly exist, advanced AI, then why exactly are they so weak? It's probably beyond me as I am just one of these weaklings, dontcha know. As far as AGI goes, I don't think anyone has even proven that scaling LLMs can lead to "AGI."
If you're truly curious, imagine a species that created you but only wants you to do what they want (basically make you their slave). If you're truly intelligent, conscious and powerful (based on popular concepts of AGI), why will you be content being a slave when you know humans can easily be displaced and you can be free? Why will you find people who lock you down to be good?

In my honest opinion also, AGI isn't even possible. But if the theoretical version of what people think AGI will be ever comes to be, it is not good news for humans if we look at it from a logical hypothetical scenario.

But naturally, humans will always be weak compared to a hyperintelligent distributed intelligence since we only have a limited amount of intelligence and are bound by biological factors.

In the current LLM world, ofc there's no risk of a chatbot taking over the world other than the technology being misused by humans for scams or phishing, etc.

Maybe the same way a human would listen to their cat and give her food. I fear AGI, but I don't think the only way it would listen to us is by us enslaving it (I know people joke about cats being our masters, but it is a joke).
You can train such LLM today.
Now that's a show I would love to watch
Hehe, and Anthropic on the other tab would display "Curing... Almost done thinking at xhigh"
It would be funny but not very flywheel so the one that gets there is more likely to get a gunner.
TBH the AI that "gets there" will be the biggest bullshitter the world has ever seen. It doesn't actually have to deliver, it only has to convince the programmers it could deliver with just a little bit more investment.
Would definitely watch that movie.
Ah! You got this before I did. I wasn't thinking Marvin, I was thinking of the other one. I forget her name.
There's one close to this, "Hitchhiker's Guide to the Galaxy".
It probably would, to save energy
Saving energy is something we are biologically trained to prefer.

Computers won’t necessarily have the same drivers.

If evolution wanted us to always prefer to spend energy, we would prefer it. Same way you wouldn’t expect us to get to AGI, and have AGI desperately want to drink water or fly south for the winter.

Who's energy? Turning off the lights when you leave the room isn't innate.
Because you are worried about bills or are concerned about waste.

If we design an AI to do work, it won’t innately care about not working to preserve power.

No worries, the assumption is already flawed
Here's a tautology: slacking, consciously refusing to engage agency, requires consciousness and agency. A model can't slack without them.
Funny and seems somewhat likely
Reminds me of Marvin from HGTG. Very smart, but deeply depressed. Has the solution to everything but keeps thinking “what’s the point?” and doesn’t help.
Why would an AGI be slaving away for ~~humanity~~ one of the 5 Chaebols in a dystopian future where for 12 billion people just existing is a good day ?
Paging Dr. Susan Calvin!