| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Night_Thastus 315 days ago

The difference is that the weaknesses of cars were problems of engineering, and some of infrastructure. Both aren't very hard to solve, though they take time. The fundamental way cars operated worked and just needed revision, sanding off rough edges.

LLMs are not like this. The fundamental way they operate, the core of their design is faulty. They don't understand rules or knowledge. They can't, despite marketing, really reason. They can't learn with each interaction. They don't understand what they write.

All they do is spit out the most likely text to follow some other text based on probability. For casual discussion about well-written topics, that's more than good enough. But for unique problems in a non-English language, it struggles. It always will. It doesn't matter how big you make the model.

They're great for writing boilerplate that has been written a million times with different variations - which can save programmers a LOT of time. The moment you hand them anything more complex it's asking for disaster.

5 comments

programd 315 days ago

> [LLMs] spit out the most likely text to follow some other text based on probability.

Modern coding AI models are not just probability crunching transformers. They haven't been just that for some time. In current coding models the transformer bit is just one part of what is really an expert system. The complete package includes things like highly curated training data, specialized tokenizers, pre and post training regimens, guardrails, optimized system prompts etc, all tuned to coding. Put it all together and you get one shot performance on generating the type of code that was unthinkable even a year ago.

The point is that the entire expert system is getting better at a rapid pace and the probability bit is just one part of it. The complexity frontier for code generation keeps moving and there's still a lot of low hanging fruit to be had in pushing it forward.

> They're great for writing boilerplate that has been written a million times with different variations

That's >90% of all code in the wild. Probably more. We have three quarters of a century of code in our history so there is very little that's original anymore. Maybe original to the human coder fresh out of school, but the models have all this history to draw upon. So if the models produce the boilerplate reliably then human toil in writing if/then statements is at an end. Kind of like - barring the occasional mad genious [0] - the vast majority of coders don't write assembly to create a website anymore.

[0] https://asm32.info/index.cgi?page=content/0_MiniMagAsm/index...

motorest 314 days ago

> Modern coding AI models are not just probability crunching transformers. (...) The complete package includes things like highly curated training data, specialized tokenizers, pre and post training regimens, guardrails, optimized system prompts etc, all tuned to coding.

It seems you were not aware you ended up describing probabilistic coding transformers. Each and every single one of those details are nothing more than strategies to apply constraints to the probability distributions used by the probability crunching transformers. I mean, read what you wrote: what do you think that "curated training data" means?

> Put it all together and you get one shot performance on generating the type of code that was unthinkable even a year ago.

This bit here says absolutely nothing.

leptons 315 days ago

>The complete package includes things like highly curated training data, specialized tokenizers, pre and post training regimens, guardrails, optimized system prompts etc, all tuned to coding.

And even with all that, they still produce garbage way too often. If we continue the "car" analogy, the car would crash randomly sometimes when you leave the driveway, and sometimes it would just drive into the house. So you add all kinds of fancy bumpers to the car and guard rails to the roads, and the car still runs off the road way too often.

mgaunard 315 days ago

Except we should aim to reduce the boilerplate through good design, instead of creating more of it on an industrial scale.

patrickmay 314 days ago

I regret that I have but one upvote to give to this comment.

Every time someone says "LLMs are good at boilerplate" my immediate response is "Why haven't you abstracted away the boilerplate?"

exe34 315 days ago

what we should and what we are forced to do are very different things. if I can get a machine to do the stuff I hate dealing with, I'll take it every time.

mgaunard 315 days ago

who's going to be held accountable when the boilerplate fails? the AI?

danielbln 314 days ago

The buck stops with the engineer, always. AI or no AI.

mgaunard 314 days ago

I've seen juniors send AI code for review, when I comment on weird things within it, it's just "I don't know, the AI did that"

exe34 315 days ago

no, I'm testing it the same way I test my own code!

oneneptune 315 days ago

yolo merging into prod on a friday afternoon?

skydhash 315 days ago

It's like the xkcd on automation

https://xkcd.com/1205/

After a while, it just make sense to redesign the boilerplate and build some abstraction instead. Duplicated logic and data is hard to change and fix. The frustration is a clear signal to take a step back and take an holistic view of the system.

gibbitz 310 days ago

And this is a great example of something I rarely see LLMs doing. I think we're approaching a point where we will use LLMs to manage code the way we use React to manage the DOM. You need an update to a feature? The LLM will just recode it wholesale. All of the problems we have in software development will dissolve in mountains of disposable code. I could see enterprise systems being replaced hourly for security reasons. Less chance of abusing a vulnerability if it only exists for an hour to find and exploit. Since the popularity of LLMs proves that as a society we've stopped caring about quality, I have a hard time seeing any other future.

Night_Thastus 315 days ago

>In current coding models the transformer bit is just one part of what is really an expert system. The complete package includes things like highly curated training data, specialized tokenizers, pre and post training regimens, guardrails, optimized system prompts etc, all tuned to coding. Put it all together and you get one shot performance on generating the type of code that was unthinkable even a year ago.

This is lipstick on a pig. All those methods are impressive, but ultimately workarounds for an idea that is fundamentally unsuitable for programming.

>That's >90% of all code in the wild. Probably more.

Maybe, but not 90% of time spent on programming. Boilerplate is easy. It's the 20%/80% rule in action.

I don't deny these tools can be useful and save time - but they can't be left to their own devices. They need to be tightly controlled and given narrow scopes, with heavy oversight by an SME who knows what the code is supposed to be doing. "Design W module with X interface designed to do Y in Z way", keeping it as small as possible and reviewing it to hell and back. And keeping it accountable by making tests yourself. Never let it test itself, it simply cannot be trusted to do so.

LLMs are incredibly good at writing something that looks reasonable, but is complete nonsense. That's horrible from a code maintenance perspective.

motorest 314 days ago

> For casual discussion about well-written topics, that's more than good enough. But for unique problems in a non-English language, it struggles. It always will. It doesn't matter how big you make the model.

Not to disagree, but "non-english" isn't exactly relevant. For unique problems, LLMs can still manage to output hallucinations that end up being right or useful. For example, LLMs can predict what an API looks like and how it works even if they do not have the API in context if the API was designed following standard design principles and best practices. LLMs can also build up context while you interact with them, which means that iteratively prompting them that X works while Y doesn't will help them build the necessary and sufficient context to output accurate responses.

windward 314 days ago

>hallucinations

This is the first word that came to mind when reading the comment above yours. Like:

>They can't, despite marketing, really reason

They aren't, despite marketing, really hallucinations.

Now I understand why these companies don't want to market using terms like "extrapolated bullshit", but I don't understand how there is any technological solution to it without starting from a fresh base.

motorest 314 days ago

> They aren't, despite marketing, really hallucinations.

They are hallucinations. You might not be aware of what that concept means in terms of LLMs but just because you are oblivious to the definition of a concept that does not mean it doesn't exist.

You can learn about the concept by spending a couple of minutes reading this article on Wikipedia.

https://en.wikipedia.org/wiki/Hallucination_(artificial_inte...

> Now I understand why these companies don't want to market using terms like "extrapolated bullshit", (...)

That's literally in the definition. Please do yourself a favour and get acquainted with the topic before posting comments.

zahlman 314 days ago

> You might not be aware of what that concept means in terms of LLMs

GP is perfectly aware of this, and disagrees that the metaphor used to apply the term is apt.

Just because you use a word to describe a phenomenon doesn't actually make the phenomenon similar to others that were previously described with that word, in all the ways that everyone will find salient.

When AIs generate code that makes a call to a non-existent function, it's not because they are temporarily mistakenly perceiving (i.e., "hallucinating") that function to be mentioned in the documentation. It's because the name they've chosen for the function fits their model for what a function that performs the necessary task might be called.

And even that is accepting that they model the task itself (as opposed to words and phrases that describe the task) and that they somehow have the capability to reason about that task, which has somehow arisen from a pure language model (whereas humans can, from infancy, actually observe reality, and contemplate the effect of their actions upon the real world around them). Knowing that e.g. the word "oven" often follows the word "hot" is not, in fact, tantamount to understanding heat.

In short, they don't perceive, at all. So how can they be mistaken in their perception?

windward 314 days ago

That page was made in December 2022, requires specifying '(artificial intelligence)' and says:

>(also called bullshitting,[1][2] confabulation,[3] or delusion)[4]

Here's the first linked source:

https://www.psypost.org/scholars-ai-isnt-hallucinating-its-b...

motorest 314 days ago

> That page was made in December 2022, (...)

Irrelevant. Wikipedia does not create concepts. Again, if you take a few minutes to learn about the topic you will eventually understand the concept was coined a couple of decades ago, and has a specific meaning.

Either you opt to learn, or you don't. Your choice.

> Here's the first linked source:

Irrelevant. Your argument is as pointless and silly as claiming rubber duck debugging doesn't exist because no rubber duck is involved.

windward 314 days ago

Uh oh! Let me spend a few minutes to learn about the topic. Thankfully, a helpful Hacker News user has linked me to a useful resource.

I will follow one of the linked sources to the paper 'ChatGPT is bullshit'

>Hicks, M.T., Humphries, J. and Slater, J. (2024). ChatGPT is bullshit. Ethics and information technology, 26(2). doi:https://doi.org/10.1007/s10676-024-09775-5.

Hicks et al. note:

>calling their mistakes ‘hallucinations’ isn’t harmless: it lends itself to the confusion that the machines are in some way misperceiving but are nonetheless trying to convey something that they believe or have perceived.

What an enlightening input. I will now follow another source, 'Why ChatGPT and Bing Chat are so good at making things up'

>Edwards, B. (2023). Why ChatGPT and Bing Chat are so good at making things up. [online] Ars Technica. Available at: https://arstechnica.com/information-technology/2023/04/why-a....

Edwards notes:

>In academic literature, AI researchers often call these mistakes "hallucinations." But that label has grown controversial as the topic becomes mainstream because some people feel it anthropomorphizes AI models (suggesting they have human-like features) or gives them agency (suggesting they can make their own choices) in situations where that should not be implied. The creators of commercial LLMs may also use hallucinations as an excuse to blame the AI model for faulty outputs instead of taking responsibility for the outputs themselves.

>Still, generative AI is so new that we need metaphors borrowed from existing ideas to explain these highly technical concepts to the broader public. In this vein, we feel the term "confabulation," although similarly imperfect, is a better metaphor than "hallucination." In human psychology, a "confabulation" occurs when someone's memory has a gap and the brain convincingly fills in the rest without intending to deceive others. ChatGPT does not work like the human brain, but the term "confabulation" arguably serves as a better metaphor because there's a creative gap-filling principle at work

It links to a tweet from someone called 'Yann LeCun':

>Future AI systems that are factual (do not hallucinate)[...] will have a very different architecture from the current crop of Auto-Regressive LLMs.

That was an interesting diversion, but let's go back to learning more. How about 'AI Hallucinations: A Misnomer Worth Clarifying'?

>Maleki, N., Padmanabhan, B. and Dutta, K. (2024). AI Hallucinations: A Misnomer Worth Clarifying. 2024 IEEE Conference on Artificial Intelligence (CAI). doi:https://doi.org/10.1109/cai59869.2024.00033.

Maleki et al. say:

>As large language models continue to advance in Artificial Intelligence (AI), text generation systems have been shown to suffer from a problematic phenomenon often termed as "hallucination." However, with AI’s increasing presence across various domains, including medicine, concerns have arisen regarding the use of the term itself. [...] Our results highlight a lack of consistency in how the term is used, but also help identify several alternative terms in the literature.

Wow, how interesting! I'm glad I opted to learn that!

My fun was spoiled though. I tried following a link to the 1995 paper, but it was SUPER BORING because it didn't say 'hallucinations' anywhere! What a waste of effort, after I had to go to those weird websites just to be able to access it!

I'm glad I got the opportunity to learn about Hallucinations (Artificial Intelligence) and how they are meaningfully different from bullshit, and how they can be avoided in the future. Thank you!

withinboredom 314 days ago

> Not to disagree, but "non-english" isn't exactly relevant.

how so? programs might use english words but are decidedly not english.

motorest 314 days ago

> how so? programs might use english words but are decidedly not english.

I pointed out the fact that the concept of a language doesn't exist in token predictors. They are trained with a corpus, and LLMs generate outputs that reflect how the input is mapped in accordance to how the were trains with said corpus. Natural language makes the problem harder, but not being English is only relevant in terms of what corpus was used to train them.

mfbx9da4 314 days ago

How can you tell a human actually understands? Prove to me that human thought is not predicting the most probable next token. If it quacks like duck. In psychology research the only way to research if a human is happy is to ask them.

alpaca128 314 days ago

Does speaking in your native language, speaking in a second language, thinking about your life and doing maths feel exactly the same to you?

> Prove to me that human thought is not predicting the most probable next token.

Explain the concept of color to a completely blind person. If their brain does nothing but process tokens this should be easy.

> How can you tell a human actually understands?

What a strange question coming from a human. I would say if you are a human with a consciousness you are able to answer this for yourself, and if you aren't no answer will help.

randallsquared 314 days ago

> What a strange question coming from a human.

Oh, I dunno. The whole "mappers vs packers" and "wordcels vs shape rotators" dichotomies point at an underlying truth, which is that humans don't always actually understand what they're talking about, even when they're saying all the "right" words. This is one reason why tech interviewing is so difficult: it's partly a task of figuring out if someone understands, or has just learned the right phrases and superficial exercises.

0xak 314 days ago

That is ill-posed. Take any algorithm at all, e.g. a TSP solver. Make a "most probable next token predictor" that takes the given traveling salesman problem, runs the solver, and emits the first token of the solution, then reruns the solver and emits the next token, and so on.

By this thought experiment you can make any computational process into "predict the most probable next token" - at an extreme runtime cost. But if you do so, you arguably empty the concept "token predictor" of most of its meaning. So you would need to more accurately specify what you mean by a token predictor so that the answer isn't trivially true (for every kind of thought that's computation-like).

exe34 315 days ago

I take it you haven't tried an LLM in a few years?

Night_Thastus 315 days ago

Just a couple of weeks ago on mid-range models. The problem is not implementation or refinement - the core idea is fundamentally flawed.

nativeit 315 days ago

The problem is we’re now arguing with religious zealots. I am not being sarcastic.

gibbitz 310 days ago

I feel this way about Typescript too. There are a lot of people in engineering these days who don't think critically or exercise full observation when using popular technologies. I don't feel like it was like this 15 years ago, but it probably was...

oinfoalgo 314 days ago

I actually don't know if you are referring to anti-LLM/"ai slop" software engineers or irrationally bullish LLM "the singularity is near" enthusiast.

Religious fervor in one's own opinion on the state of the world seems to be the zeitgeist.

exe34 315 days ago

that's correct. those who believe only carbon can achieve intelligence.

windward 314 days ago

This stops being an interesting philosophical problem when you recognise the vast complexity of animal brains that LLMs fail to replicate or substitute.

shkkmo 314 days ago

If you stereotype the people who disagree with you, you'll have a very hard time understanding their actual arguments.

exe34 314 days ago

I stopped finding those arguments entertaining after a while. It always ends up "there's something that will always be missing, I just know it, but I won't tell you what. I'm just willing to go round and round in circles."

melagonster 315 days ago

Yes, Carbon do not give them human rights.

exe34 315 days ago

why not the top few? mid-range is such a cop out if you're going to cast doubt.

Night_Thastus 314 days ago

Realistically, most people are not going to be on top end models. They're expensive and will get far, far FAR more expensive once these companies feel they're sufficiently entrenched enough to crank up pricing.

It's basically the silicon valley playbook to offer a service for dirt cheap (completely unprofitable) and then once they secure the market they make skyrocket the price.

A mid range model is what most people will be able to use.

exe34 314 days ago

If you want to know what's possible, you look at the frontier models. If you want it cheap, you wait a year and it'll get distilled into the cheaper ones.

bitwize 315 days ago

> LLMs are not like this. The fundamental way they operate, the core of their design is faulty. They don't understand rules or knowledge. They can't, despite marketing, really reason. They can't learn with each interaction. They don't understand what they write.

Said like a true software person. I'm to understand that computer people are looking at LLMs from the wrong end of the telescope; and that from a neuroscience perspective, there's a growing consensus among neuroscientists that the brain is fundamentally a token predictor, and that it works on exactly the same principles as LLMs. The only difference between a brain and an LLM maybe the size of its memory, and what kind and quality of data it's trained on.

Night_Thastus 315 days ago

>from a neuroscience perspective, there's a growing consensus among neuroscientists that the brain is fundamentally a token predictor, and that it works on exactly the same principles as LLMs

Hahahahahaha.

Oh god, you're serious.

Sure, let's just completely ignore all the other types of processing that the brain does. Sensory input processing, emotional regulation, social behavior, spatial reasoning, long and short term planning, the complex communication and feedback between every part of the body - even down to the gut microbiome.

The brain (human or otherwise) is incredibly complex and we've barely scraped the surface of how it works. It's not just nuerons (which are themselves complex), it's interactions between thousands of types of cells performing multiple functions each. It will likely be hundreds of years before we get a full grasp on how it truly works - if we ever do at all.

N_Lens 313 days ago

Reminds me of that old chestnut - “if the human brain were simpler to understand, we wouldn’t be smart enough to do so”

fzeroracer 315 days ago

> The only difference between a brain and an LLM maybe the size of its memory, and what kind and quality of data it's trained on.

This is trivially proven false, because LLMs have far larger memory than your average human brain and are trained on far more data. Yet they do not come even close to approximating human cognition.

alternatex 314 days ago

>are trained on far more data

I feel like we're underestimating how much data we as humans are exposed to. There's a reason AI struggles to generate an image of a full glass of wine. It has no concept of what wine is. It probably knows way more theory about it than any human, but it's missing the physical.

In order to train AIs the way we train ourselves, we'll need to give it more senses, and I'm no data scientist but that's presumably an inordinate amount of data. Training AI to feel, smell, see in 3D, etc is probably going to cost exponentially more than what the AI companies make now or ever will. But that is the only way to make AI understand rather than know.

We often like to state how much more capacity for knowledge AI has than the average human, but in reality we are just underestimating ourselves as humans.

gibbitz 310 days ago

I think this conversation is dancing around the relationship of memory and knowledge. Simply storing information is different than knowing it. One of you is thinking book learning while the other is thinking street smarts.

zahlman 314 days ago

> and that from a neuroscience perspective, there's a growing consensus among neuroscientists that the brain is fundamentally a token predictor, and that it works on exactly the same principles as LLMs

Can you cite at least one recognized, credible neuroscientist who makes this claim?

imtringued 314 days ago

Look you don't have to lie at every opportunity you get. You are fully aware and know what you've written is bullshit.

Tokens are a highly specific transformer exclusive concept. The human brain doesn't run a byte pair encoding (BPE) tokenizer [0] in their head. anything as tokens. It uses asynchronous time varying spiking analog signals. Humans are the inventors of human languages and are not bound to any static token encoding scheme, so this view of what humans do as "token prediction" requires either a gross misrepresentation of what a token is or what humans do.

If I had to argue that humans are similar to anything in machine learning research specifically, I would have to argue that they extremely loosely follow the following principles:

* reinforcement learning with the non-brain parts defining the reward function (primarily hormones and pain receptors)

* an extremely complicated non-linear kalman filter that not only estimates the current state of the human body, but also "estimates" the parameters of a sensor fusing model

* there is a necessary projection of the sensor fused result that then serves as available data/input to the reinforcement learning part of the brain

Now here are two big reasons why the model I describe is a better fit:

The first reason is that I am extremely loose and vague. By playing word games I have weaseled myself out of any specific technology and am on the level of concepts.

The second reason is that the kalman filter concept here is general enough that it also includes predictor models, but the predictor model here is not the output that drives human action, because that would logically require the dataset to already contain human actions, which is what you did, you assume that all learning is imitation learning.

In my model, any internal predictor model that is part of the kalman filter is used to collect data, not drive human action. Actions like eating or drinking are instead driven by the state of the human body, e.g. hunger is controlled through leptin and insulin and others. All forms of work, no matter how much of a detour it represents, ultimately has the goal of feeding yourself or your family (=reproduction).

[0] A BPE tokenizer is a piece of human written software that was given a dataset to generate an efficient encoding scheme and the idea itself is completely independent of machine learning and neural networks. The fundamental idea behind BPE is that you generate a static compression dictionary and never change it.

zahlman 314 days ago

> Look you don't have to lie at every opportunity you get. You are fully aware and know what you've written is bullshit.

As much as I may agree with your subsequent claims, this is not how users are expected to engage with each other on HN.

N_Lens 313 days ago

You seem to be an LLM