Hacker News new | ask | show | jobs
by futuraperdita 171 days ago
What worries me is that _a lot of people seem to see LLMs as smarter than themselves_ and anthropmorphize them into a sort of human-exact intelligence. The worst-case scenario of Utah's law is that when the disclaimer is added that the report is generated by AI, enough jurists begin to associate that with "likely more correct than not".
6 comments

Reading how AI is being approached in China, the focus is more on achieving day to day utilty, without eviscerating youth employment.

In contrast, the SV focus of AI has been about skynet / singularity, with a hype cycle to match.

This is supported by the lack of clarity on actual benefits, or clear data on GenAI use. Mostly I see it as great for prototyping - going from 0 to 1, and for use cases where the operator is highly trained and capable of verifying output.

Outside of that, you seem to be in the land of voodoo, where you are dealing with something that eerily mimics human speech, but you don't have any reliable way of finding out its just BS-ing you.

Do you have any links you could share to content you found especially insightful about AI use in China?
I don't know if it supports their particular point, but Machine Decision is Not Final seems like a very cool and interesting look at China's culture around AI:

https://www.urbanomic.com/book/machine-decision-is-not-final...

In the West we have autonomous systems to commit genocide, detecting and murdering "enemy combatants" at scale, where "enemy combatant" is defined as "male between the ages of 15 and 55".

Sometimes I'm not so sure about any so-called moral superiority.

Citation? Not saying you’re wrong but my time in defense left me very much with the opposite opinion (radar target acquisitions had to be approved by a human, always)
I’ve been hunting for a link I found here on HN, which discussed how policy /government elites in China looked at AI.

Sadly, the search for that link continues.

I did find these from SCMP and Foreign Policy, but there are better articles out there.

- https://foreignpolicy.com/2025/11/20/china-ai-race-jobs-yout...

- https://www.scmp.com/specialist-publications/special-reports...

I’m not seeing the dichotomy as much as you do.

Are they not going to build a “skynet” in China? Second, building skynet doesn’t imply eviscerating youth employment.

On the other hand, automation of menial tasks does eviscerate all kinds of employment, not only youth emoloyment.

Well at least DeepMind is doing nifty things like solving the protein folding problem.
One problem here is "smarter" is an ambiguous word. I have no problem believing the average LLM has more knowledge than my brain; if that's what "smarter" means, them I'm happy to believe I'm stupid. But I sure doubt an LLM's ability to deduce or infer things, or to understand its own doubts and lack of knowledge or understanding, better than a human like me.
Yeah my thought is that you wouldn't trust a brain surgeon who has read every paper on brain surgery ever written but who has never touched a scalpel.

Similarly, the claim is that ~90% of communication is nonverbal, so I'm not sure I would trust a negotiator who has seen all of written human communication but never held a conversation.

> a lot of people seem to see LLMs as smarter than themselves

Well, in many cases they might be right..

As far as I can tell from poking people on HN about what "AGI" means, there might be a general belief that the median human is not intelligent. Given that the current batch of models apparently isn't AGI I'm struggling to see a clean test of what AGI might be that a human can pass.
LLMs may appear to do well on certain programming tasks on which they are trained intensively, but they are incredibly weak. If you try to use an LLM to generate, for example, a story, you will find that it will make unimaginable mistakes. If you ask an LLM to analyze a conversation from the internet it will misrepresent the positions of the participants, often restating things so that they mean something different or making mistakes about who said what in a way that humans never do. The longer the exchange the more these problems are exacerbated.

We are incredibly far from AGI.

We do have AI systems that write stories [0]. They work. The quality might not be spectacular but if you've ever gone out and spent time reading fanfiction you'd have to agree there are a lot of rather terrible human writers too (bless them). It still hits this issue that if we want LLMs to compete with the best of humanity then they aren't there yet, but that means defining human intelligence as something that most people don't have access to.

> If you ask an LLM to analyze a conversation from the internet it will misrepresent the positions of the participants, often restating things so that they mean something different or making mistakes about who said what in a way that humans never do.

AI transcription & summary seems to be a strong point of the models so I don't know what exactly you're trying to get to with this one. If you have evidence for that I'd actually be quite interested because humans are so bad at representing what other people said on the internet it seems like it should be an easy win for an AI. Humans typically have some wild interpretations of what other people write that cannot be supported from what was written.

[0] https://github.com/google-deepmind/dramatron

I haven't tried Dramatron, but my experience is that it isn't possible to do sensibly. With regard to the second part

>AI transcription & summary seems to be a strong point of the models so I don't know what exactly you're trying to get to with this one. If you have evidence for that I'd actually be quite interested because humans are so bad at representing what other people said on the internet it seems like it should be an easy win for an AI. Humans typically have some wild interpretations of what other people write that cannot be supported from what was written.

Transcription and summarization is indeed fine, but try posting a longer reddit or HN discussion you've been part of into any model of your choice and ask it to analyze it, and you will see severe errors very soon. It will consistently misrepresent the views expressed and it doesn't really matter what model you go for. They can't do it.

I can see why they'd struggle, I'm not sure what you're trying to ask the model to do. What type of analysis are you expecting? If the model is supposed to represent the views expressed that would be a summary. If you aren't asking it for a summary what do you want it to do? Do you literally mean you want the model to perform conversational analysis (ie, https://en.wikipedia.org/wiki/Conversation_analysis#Method)?
> We are incredibly far from AGI.

This and we don't actually know what the foundation models are for AGI, we're just assuming LLMs are it.

This seems distant from my experience. Modern LLMs are superb at summarisation, far better than most people.
> there might be a general belief that the median human is not intelligent

This is to deconstruct the question.

I don't think it's even wrong - a lot of people are doing things, making decisions, living life perfectly normally, successfully even, without applying intelligence in a personal way. Those with socially accredited 'intelligence' would be the worst offenders imo - they do not apply their intelligence personally but simply massage themselves and others towards consensus. Which is ultimately materially beneficial to them - so why not?

For me 'intelligence' would be knowing why you are doing what you are doing without dismissing the question with reference to 'convention', 'consensus', someone/something else. Computers can only do an imitation of this sort of answer. People stand a chance of answering it.

>knowing why you are doing what you are doing[...] Computers can only do an imitation of this sort of answer. People stand a chance of answering it.

I'm not following. A computer's "why" is a written program, surely that is the most clear expression of its intent you could ask for?

A computer doesn't determine the why, it is programmed to do so. It doesn't determine meaning or value from whatever-it-is.
Did you mean it doesn't set its own goals? Or what did you mean by "determine the why" if not a stack trace of its motivations(which is to say, its programming)? Could you give an example of determinimg meaning or value?
Being an intelligent being is not the same as being considered intelligent relative to the rest of your species. I think we’re just looking to create an intelligence, meaning, having the attributes that make a being intelligent, which mostly are the ability to reason and learn. I think the being might take over from there no?

With humans, the speed and ease with which we learn and reason is capped. I think a very dumb intelligence with stay dumb for not very long because every resource will be spent in making it smarter.

Why would the dumb intelligence be less constrained than a human in making itself smarter?
I have yet to see an LLM with hands, feet, or eyeballs.

Currently, LLMs require hooks and active engagement with humans to ‘do’ anything. Including learn.

> every resource will be spent in making it smarter

The root motivation on which every resource will be spent is simply and very obviously to make a profit.

So tired of this argument.
> ChatGPT (o3): Scored 136 on the Mensa Norway test in April 2025

So yes, most people are right in that assumption, at least by the metric of how we generally measure intelligence.

Does an LLM scoring well on the Mensa test translate to it doing excellent and factual police reporting? It is probably not true of humans doing well on the Mensa, why would it be true of an LLM?

We should probably rigorously verify that, for a role that itself is about rigorous verification without reasonable doubt.

I can immediately, and reasonably, doubt the output of an LLM, pending verification.

> the metric of how [the uninformed] generally measure intelligence
How do the informed measure intelligence?

I know I'm too late to ask this question, But I suspect its either; Feelings and intuitions, which is just a primitive IQ test. Or some kind of aptitude test, which is just a different flavor of IQ test.

Court reports should as much be about human sensibility. I have met plenty of high IQ people who were insensitive.
Having listened to some the new AI generated songs on utube, looks like they might be better at being sensitive humans than we are as well..
Where do you imagine they copied those human sensitivities from? The weather?
The same place as humans do, other humans.
Yeah I certainly associate LLMs with high intelligence when they provide fake links to fake information, I think, man this thing is SMART
Maybe it's just my circle, but anecdotally most of the non-CS folks I know have developed a strong anti-AI bias. In a very outspoken way.

If anything, I think they'd consider AI's involvement as a strike against the prosecution if they were on a jury.

A core problem with humans, or perhaps it's not even a problem, just something that takes a long time to recognize, is that they complain and hate on something that they continue to spend money on.

Not like food or clothing, but stuff like DLC content, streaming services, and LLMs.

Usually different people. Or, in the case of LLMs, they're not given a no option, or it's carefully hidden.
At least in my case, I suspect they also don't keep up with the progress. They did experiments in 2023/24, were thoroughly put off, have not fired it up since. So the impression they have is frozen in time, a time when it was indeed much less impressive.
Why do people in your circle not like AI? I have similar a experience about friends and family not liking AI, but usually it’s due to water and energy reasons, not because of an issue with the model reasoning
If your circle has any artists in it, chances are they'll also have a very negative perception, although influenced heavily by the proliferation of AI-generated art.

At least personally, I've seen basically three buckets of opinions from non-technical people on AI. There's a decent-sized group of people who loathe anything to do with it due to issues you've mentioned, the art issue I mentioned, or other specific things that overall add up to the point that they think it's a net harm to society, a decent-sized group of people who basically never think about it at all or go out of their way to use anything related to it, and then a small group of people who claim to be fully aware of the limitations and consider themselves quite rational but then will basically ask ChatGPT about literally anything and trust what it says without doing any additional research. It's the last group that I'm personally most concerned about because I've yet to find any effective way of getting them to recognize the cognitive dissonance (although sometimes at least I've been able to make enough of an impression that they stop trying to make ChatGPT a participant in every single conversation I have with them).

Pretty much hit the nail on the head -- while there are some artists, most are from traditional broadly "intellectual" fields. Examples: writers, journalists, academia (liberal arts), publishing industry...
That's a good point; "art" might be a bit too narrow to accurately describe the type of field where people have fairly concrete concerns about how AI relates to what they produce. I'd be tempted to use the label "creative work", but even that doesn't quite feel like it's something that everyone would understand to include stuff like written journalism, which I think is likely to have pretty similar concerns.
AIs are an obvious threat to their ability to make money off their skills.
> a lot of people seem to see LLMs as smarter than themselves

I think the anthropomorphizing part is what messes with people. Is the autocomplete in my IDE smarter than I am? What about the search box on Google? What about a hammer or a drill?

Yet, I will admit that most of the time I hear people complaining about how AI written code is worse than that produced by developers, but it just doesn't match my own experience - it's frankly better (with enough guidance and context, say 95% tokens in and 5% tokens out, across multiple models working on the same project to occasionally validate and improve/fix the output, alongside adequate tooling) than what a lot of the people I know could or frankly do produce in practice.

That's a lot of conditions, but I think it's the same with the chat format - people accepting unvalidated drivel as fact, or someone using the web search and parsing documents and bringing up additional information that's found as a consequence of the conversation, bringing in external data and making use of the LLM ability to churn through a lot of it, sometimes better than the human reading comprehension would.

I think you're spot on here. It's the same idea as scammers and con artists; people can be convinced of things that they might rationally reject if the language is persuasive enough. This isn't some new exploit in human behavior or an epidemic of people who are less intelligent than before; we've just never had to deal with the amount plausible enough sounding coherent human language being almost literally unlimited before. If we're lucky, people will manage to adapt and update their mental models to be less trustworthy of things that they can't verify (like how most of us hopefully don't need to be concerned their older relatives will transfer their bank account contents to benevolent foreign royalties with the expectation of being rewarded handsomely). It's hard to feel especially confident in this though given how much more open-ended the potential deceptions are (without even getting into the question of "intent" from the models or the creators of them).
My belief is that the function of a story is to provide social cover for our actions. Other people need to evaluate us (both in the moment and after the dust has settled) and while careful data analysis can do the job, who has time for that crap.

As such the story can be completely divorced from reality. The important thing is that the story is a good one. A good story transfers your social cover for yourself to your supervisor. They don't have to understand what you did and explain why it's okay that it failed. They just have to understand the story structure that you gave them. Listen to this great story, it's not my report's fault for this failure, and it's certainly not mine, just bad luck.

Additionally, the good (and sufficiently original) story is a gift because your supervisor can reuse it for new scenarios.

The good salesman gives you the story you need to excuse the purchase that will enable you to succeed. The bad salesman sells you on a story that you need a frivolous purchase.

And this is why job hoping is "bad". Eventually the incompetent employee uses up all of their good stories and management catches onto their act. It's embedded into our language. "Oh we've all heard this story before." The job hopper leaves just as their good stories are exhausted and can start over fresh at the new employer.

All of this in response to

> If we're lucky, people will manage to adapt and update their mental models to be less trustworthy of things that they can't verify

Yes, if we're lucky that is what will happen. But I fear that we're going to have to transition to a very low trust society for that to happen.

Reliance on the story is reliant on the trust that someone has done the real work. Distrust of the story implies a wider scale distrust in others and institutions.

Maybe we can add a tradition of annotating our stories with arguments and proofs. Although I've spent a two decade career desperately trying to give highly technical people arguments and proofs and I've seen stories completely unmoored from reality win out every time.

Optimistically, I'm just really bad at it and it's actually a natural transition. Pessimistically, we're in for a bumpy ride.

I'm not sure I'm quite as pessimistic as you, just because I tend to treat most predictions of how society will adapt to things as a whole as fairly low confidence, but I certainly don't disagree that it at least seems hard to imagine people getting past all this quickly.

The idea of story being how people justify making their decisions is interesting. I'm reminded of a couple of anecdotes my father has repeated a few times over the years about two distinct medical circumstances he's had. When he was first diagnosed with sleep apnea, he apparently was very skeptical that he had any reason to do anything because the sleep doctor told him things like "this will help you be less sleepy during the day" and "you won't start nodding off as you drive" when he didn't feel like either of those experiences happened to him. Eventually a different sleep doctor did convince him it was worthwhile to treat, and he's used a CPAP since then, he still seems not to feel like it would have made sense for him to start when he first got the diagnosis. Through the lens you've given, the original doctor didn't give him a compelling enough story to justify the effort on his part. On the other hand, the first time he talked to a nutritionist about changing his diet, he apparently mentioned something about how he wanted to at least be able to eat ice cream occasionally, even if it was less often, rather than not ever be able to eat it again, and the nutritionist replied "Of course! that would make life not worth living". He ended up being much more open to listening to the advice of the nutritionist than I would have expected, and I think it would be reasonable to argue that was because the nutritionist was able to give him a story that seemed compelling about what his life would be like with the suggested changes.

AI is smarter than everyone already. Seriously, the breadth of knowledge the AI possesses has no human counterpart.
Just this weekend it (Gemini) has produced two detailed sets of instructions on how to connect different devices over bluetooth, including a video (that I didn’t watch), while the devices did not support doing the connections in that direction. No reasonable human reading the involved manuals would think those solutions feasible. Not impressed, again.
It's pretty similar to looking something up with a search engine, mashing together some top results + hallucinating a bit, isn't it? The psychological effects of the chat-like interface + the lower friction of posting in said chat again vs reading 6 tabs and redoing your search, seems to be the big killer feature. The main "new" info is often incorrect info.

If you could get the full page text of every url on the first page of ddg results and dump it into vim/emacs where you can move/search around quickly, that would probably be similarly as good, and without the hallucinations. (I'm guessing someone is gonna compare this to the old Dropbox post, but whatever.)

It has no human counterpart in the same sense that humans still go to the library (or a search engine) when they don't know something, and we don't have the contents of all the books (or articles/websites) stored in our head.

> I'm guessing someone is gonna compare this to the old Dropbox post, but whatever.

If they do, you’ll be in good company. That post is about the exact opposite of what people usually link it for. I’ll let Dan explain:

https://news.ycombinator.com/item?id=27067281

Dan makes a case for being charitable to the commenter and how lame it is to neener-neener into the past, not that it has some opposite meaning everyone is missing out on.
Dan clearly references how people misunderstand not only the comment (“he didn't mean the software. He meant their YC application”) but also the whole interaction (“He wasn't being a petty nitpicker—he was earnestly trying to help, and you can see in how sweetly he replied to Drew there that he genuinely wanted them to succeed”).

So yes, it is the opposite of why people link to it (which is a judgement I’m making, I’m not arguing Dan has that exact sentiment), which is to mock an attitude (which wasn’t there) of hubris and lack of understanding of what makes a good product.

The comment isn't infamous because it was petty or nitpicking. It's because the comment was so poorly communicated and because the author was so profoundly out-of-touch with the average person that they had lost all perspective.

It's why it caught the zeitgeist at the time and why it's still apropos in this conversation now.

> If you could get the full page text of every url on the first page of ddg results and dump it into vim/emacs where you can move/search around quickly, that would probably be similarly as good, and without the hallucinations.

Curiously, literally nobody on earth uses this workflow.

People must be in complete denial to pretend that LLM (re)search engines can’t be used to trivially save hours or days of work. The accuracy isn’t perfect, but entirely sufficient for very many use cases, and will arguably continue to improve in the near future.

> The accuracy isn’t perfect

The reason why people don't use LLMs to "trivially save hours or days of work" is because LLMs don't do that. People would use a tool that works. This should be evidence that the tools provide no exceptional benefit, why do you think that is not true?

The only way LLM search engines save time is if you take what it says at face value as truth. Otherwise you still have to fact check whatever it spews out which is the actual time consuming part of doing proper research.

Frankly I've seen enough dangerous hallucinations from LLM search engines to immediately discard anything it says.

Of course you have to fact check - but verification is much faster and easier than searching from scratch.
How is verification faster and easier? Normally you would check an article's citations to verify its claims, which still takes a lot of work, but an LLM can't cite its sources (it can fabricate a plausible list of fake citations, but this is not the same thing), so verification would have to involve searching from scratch anyway.
For most things, no it isn’t. The reason it can work well at all for software is that it’s often (though not always) easy to validate the results. But for giving you a summary of some topic, no, it’s actually very hard to verify the results without doing all the work over again.
> People must be in complete denial

That seems to be a big part of it, yes. I think in part it’s a reaction to perceived competition.

  > the breadth of knowledge
knowledge != intelligence

If knowledge == intelligence then Google and Wikipedia are "smarter" than you and the AGI problem has been solved for several decades.

Even if we were going to accept the premise that total knowledge is equivalent to intelligence (which is silly, as sibling comments have pointed out), shouldn't accuracy also come into play? AI also says a lot more obviously wrong things than the average person, so how do you weight that against the purported knowledge? You could answer yes or no randomly to any arbitrary question about whether something is true and approximate a 50% accuracy rate with an evenly distributed pool of questions, but that's obviously not proof that you know everything. I don't think the choice of where to draw the line on "how often can you be wrong and have it still matter" is as easy as you're implying, or that everyone will necessarily agree on where it lies (even if we all agree that 50% correctness is obviously way too low).
AI has more knowledge than everyone already, I wouldn't say smarter though. It's like wisdom vs intelligence in D+D (and/or life).. wisdom is knowing things, intelligence is how quick you can learn / create new things.
AI has zero knowledge, as to know something is to have done it, or seen it first hand. AI has access to a great deal of data, much of it aquired through criminal action, but no way to evaluate that information other than cross checking for citations and similar occurances. Even for a human, infering things is difficult and uncertain, and so we regularly see AI fall of the cliff of cohearant word salading. We are heading strait at an idiocracy writ large that is trying to hide there raciorilgio insanity behind algorythims. Sometimes it's hard to tell, but it seems that a hairdresser has just been put in charge of the US passport office, which is highy sugestive of a new top level program to issue US citizenship on demand, but everbody else will be subject to the "impartiality" of privatly owned and operated AI policing.
Knowledge is what I see equivalent with a big library. It contains mostly correct information in the context of the book (which might be incorrect in general) and "ai" is very good at taking everything out of context, Smashing a probability distribution over it and picking an answer which humans will accept. E.g. it does not contain knowledge, at best the vague pretense of it.
Man, what are we supposed to do with people who think the above?
I'd do the same thing I'd do with anyone that has a different opinion than me: try my best to have an honest and open discussion with them to understand their point of view and get to the heart of why they believe said thing, without forcefully tearing apart their beliefs. A core part of that process is avoiding saying anything that could cause them to feel shame for believing something that I don't, even if I truly believe they are wrong, and just doing what I can to earnestly hear them out. The optional thing afterwards, if they seem open to it, is express my own beliefs in a way that's palatable and easily understood. Basically explain it in a language they understand, and in a way that we can think about and understand and discuss together, not taking offense to any attempts at questioning or poking holes in my beliefs because that is the discovery process imo for trying something new.

Online is a little trickier because you don't know if they're a dog. Well, now a days it's even harder, because they could also not have a fully developed frontal lobe, or worse, they could be a bot, troll, or both.

Well said, and thank you for the final paragraph. Made me chuckle.
I don't know, it's kinda terrifying how this line of thinking is spreading even on HN. AI as we have it now is just a turbocharged autocomplete, with a really good information access. It's not smart, or dumb, or anything "human" .
It just shows that true natural intelligence is difficult to define by proxy.
Do you think your own language processing abilities are significantly different from autocomplete with information access? If so, why?
I hate these kinds of questions where you try to imply it's actually the same thing as what our brains are doing. Stop it. I think it would be an affront to your own intelligence to entertain this as a serious question, so I will not.
>ChatGPT (o3): Scored 136 on the Mensa Norway IQ test in April 2025

If you don't want to believe it, you need to change the goal posts; Create a test for intelligence that we can pass better than AI.. since AI is also better at creating test than us maybe we could ask AI to do it, hang on..

>Is there a test that in some way measures intelligence, but that humans generally test better than AI?

Answer:Thinking, Something went wrong and an AI response wasn't generated.

Edit, i managed to get one to answer me; the Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI). Created by AI researcher François Chollet, this test consists of visual puzzles that require inferring a rule from a few examples and applying it to a new situation.

So we do have A test which is specifically designed for us to pass and AI to fail, where we can currently pass better than AI... hurrah we're smarter!

The validity of IQ tests as a measure of broad intelligence has been in question for far longer than LLMs have existed. And if it’s not a proper test for humans, it’s not a proper test to compare humans to anything else, be it LLMs or chimps.

https://en.wikipedia.org/wiki/Intelligence_quotient#Validity...

To be intelligent is to realise that any test for intelligence is at best a proxy for some parts of it. There's no objective way to measure intelligence as a whole, we can't even objectively define intelligence.
I believe intelligence is difficult to pin down in words but easy to spot intuitively - and so are deltas in intelligence.

E.g watch a Steve jobs interview and a Sam Altman one (at the same age). The difference in the mode of articulation, simplicity in communication, obsession over details etc are huge. This is what superior intelligence to me looks like - you know it when you see it.

>Create a test for intelligence that we can pass better than AI

Easy? The best LLMs score 40% on Butter-Bench [1], while the mean human score is 95%. LLMs struggled the most with multi-step spatial planning and social understanding.

[1] https://arxiv.org/pdf/2510.21860v1

That is really interesting; Though i suspect its just a effect of differing training data, humans are to a larger degree trained on spacial data, while LLMs are trained to a larger degree on raw information and text.

Still it may be lasting limitation if robotics don't catch up to AI anytime soon.

Don't know what to make of the Safety Risks test, threatening to power down AI in order to manipulate it, and most act like we would and comply. fascinating.

>humans are to a larger degree trained on spacial data

you must be completely LLMheaded to say something like that, lol

humans are not trained on spacial data, they are living in the world. humans are very much diffent from silicone chips, and human learning is on another magnitude of complexity compared to a large language model training

Just brace for the societal correction.

There's a lot of things going on in the western world, both financial and social in nature. It's not good in the sense of being pleasant/contributing to growth and betterment, but it's a correction nonetheless.

That's my take on it anyway. Hedge bets. Dive under the wave. Survive the next few years.

Having knowledge is not exactly the same as being smart though is it.
It's at least one component of it, and by being exceptional in that component it makes up for what it lacks in other components.
Although it helps immensely.
Only if you understand it..
It's like saying google search is smarter than everyone, amount of information indexed by it has no human counterpart, such a silly take...