Hacker News new | ask | show | jobs
by citizenpaul 11 days ago
If you asked me last year(2025) I would have still said LLMs are a silly toy.

As of Jan 2026 I have come to accept that LLMs are at least part of the puzzle of how intelligence works. They are at this point better than the majority of humans at various intellectual tasks. It may not be or ever be a 1:1 but good enough ran the world already before llms.

There is not even a formal definition of what intelligence is so saying LLM's are intelligent can't even be "right/wrong". Its just arguing semantics and definitions.

4 comments

They are better than humans at tasks that require information recall and application to specific task.

For example, front end web app layout and basic functionality. Anyone can make a website with interactive buttons with ease now, where as before, you had to go look up examples, try stuff, figure out why its not working, e.t.c.

But in terms of organization and higher level tasks, like for example making front end that is clean, robust, easily extensible, and doesn't break, LLMs require almost as much prompting to do this as it takes to actually write the code.

They are mostly "faster" than the majority of humans. They are rarely better than experienced and talented humans at the majority of tasks they are able to do. They are better on both scales on a small thin slice of work tasks.
They're not better than the best humans at practically anything. However I doubt there's a person alive that could outperform an LLM on a broad suite of tasks like Humanity's Last Exam and the vast majority of people probably couldn't answer a single question on it.
They’re the language part of the puzzle, which seems to require some basic world modeling but it can’t make novel models unless there’s an example in its training data.

I think engineering and mathematical thought requires spatial reasoning, when I model problems I see them as 3D shapes. Like the economy is a series of tubes that money flows through and collect in buckets, programming state is little boxes that hold values, chemical interactions are like keys that fit into locks.

I don’t think LLMs can build models like that, but because it has so much memorized and there usually isn’t a need for a novel model custom fit for a problem, it can fake it by imitation.

Seems like LLMs are that. A bunch of most probable word associations is a network, and you can build a physical model of a network, or build a network that allows you to reason about a physical model. Whether it's just a flowchart or workflow diagram, or an X-dimensional matrix with vectors moving through it.

But the only way to map the network in an LLM is experimentally. You have to prompt it, and see how the coefficients fall in order to construct your most likely walk through the training data.

I think that LLMs can and do come up with novel things through exhaustion, just by applying the relationships between some set of entities to entirely different sets of entities because an accumulation of earlier context pushed the probability of those entities being mentioned, and they were able to easily replace a selection of entities that were more associated with those nearer connective, relationship words.

I think that as such LLMs are good at generating metaphors, and a lot of innovation comes from going "What if As worked like Bs?" Just go through all the As and Bs, toss the ones that don't make any sense and test the ones that seem like they might.

I don't believe you can say that "LLM" is part of intelligence. No single human is exposed to as much text as any LLM model ingests, not by many orders of magnitude, and humans still perform cognition and generate new language.
Most LLMs are multimodal now, able to map visual concepts to language and vice versa. If OpenAI's recent Erdos solution was faking math, it faked it very well.
3D isn’t one of the modes though, I know a paper several years back showed that diffusion models don’t actually understand physics or geometry.

I can’t evaluate the Erdos solution personally, but both math and software have many problems that are some combination of other problems and since it can get instant verification feedback it can try millions of permutations to discover the right solution. This is valuable, I’m not dismissing it, but I think there’s another tier of harder problems that I don’t believe LLMs can solve and it will require some further theoretical breakthroughs to get there.

"How can we understand what an LLM is "thinking"? It's clearly very valuable to do so — it could enable steering model behavior, detecting dangerous intent, and more."

Well that is complete any utter bollocks, dribbled in para three or so, and obviously written by a next token guesser.

LLMs are tools and I'm pretty sure if I let you loose on some of my tools, you might lose an extremity unless I kept an eye on you.

I have an on prem Qwen3.6-35B-A3B-UD-Q4_K_XL working on a box in the office and its quite handy for a chat.