| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by samtp 332 days ago
	This is the exact same issue that I've had trying to use LLMs for anything that needs to be precise such as multi-step data pipelines. The code it produces will look correct and produce a result that seems correct. But when you do quality checks on the end data, you'll notice that things are not adding up. So then you have to dig into all this overly verbose code to identify the 3-4 subtle flaws with how it transformed/joined the data. And these flaws take as much time to identify and correct as just writing the whole pipeline yourself.

5 comments

torginus 332 days ago

I'll get into hot water with this, but I still think LLMs do not think like humans do - as in the code is not a result of a trying to recreate a correct thought process in a programming language, but some sort of statistically most likely string that matches the input requirements,

I used to have a non-technical manager like this - he'd watch out for the words I (and other engineers) said and in what context, and would repeat them back mostly in accurate word contexts. He sounded remarkably like he knew what he was talking about, but would occasionally make a baffling mistake - like mixing up CDN and CSS.

LLMs are like this, I often see Cursor with Claude making the same kind of strange mistake, only to catch itself in the act, and fix the code (but what happens when it doesn't)

vidarh 332 days ago

I think that if people say LLMs can never be made to think, that is bordering on a religious belief - it'd require humans to exceed the Turing computable (note also that saying they never can is very different from believing current architectures never will - it's entirely reasonable to believe it will take architectural advances to make it practically feasible).

But saying they aren't thinking yet or like humans is entirely uncontroversial.

Even most maximalists would agree at least with the latter, and the former largely depends on definitions.

As someone who uses Claude extensively, I think of it almost as a slightly dumb alien intelligence - it can speak like a human adult, but makes mistakes a human adult generally wouldn't, and that combinstion breaks the heuristics we use to judge competency,and often lead people to overestimate these models.

Claude writes about half of my code now, so I'm overall bullish on LLMs, but it saves me less than half of my time.

The savings improve as I learn how to better judge what it is competent at, and where it merely sounds competent and needs serious guardrails and oversight, but there's certainly a long way to go before it'd make sense to argue they think like humans.

plaguuuuuu 332 days ago

Everyone has this impression that our internal monologue is what our brain is doing. It's not. We have all sorts of individual components that exist totally outside the realm of "token generation". E.g. the amygdala does its own thing in handling emotions/fear/survival, fires in response to anything that triggers emotion. We can modulate that with our conscious brain, but not directly - we have to basically hack the amygdala by thinking thoughts that deal with the response (don't worry about the exam, you've studied for it already)

LLMs don't have anything like that. Part of why they aren't great at some aspects of human behaviour. E.g. coding, choosing an appropriate level of abstraction - no fear of things becoming unmaintainable. Their approach is weird when doing agentic coding because they don't feel the fear of having to start over.

Emotions are important.

vidarh 331 days ago

Unless we exceed the turing computable - which there isn't the tiniest shred of evidence for -, nothing we do is "outside the realm of 'token generation'". There is no reason why the token stream generated needs to be treated as equivalent to an internal monologue, or need to always be used to produce language at all, and Turing complete systems are computationally equivalent (they can all compute the same set of functions).

> Everyone has this impression that our internal monologue is what our brain is doing.

Not everyone has an internal monologue, so that would be utterly bizarre. Some people might believe this, but it is by no means relevant to Turing equivalence.

> Emotions are important.

Unless we exceed the Turing computable, our experience of emotions would be evidence that any Turing complete system can be made to act as if they experience emotions.

wat10000 331 days ago

A token stream is universal, but I don't see any reason to think that a token stream generated by an LLM can ever be universal.

I mean, theoretically in an "infinite tape" model, sure. But we don't even know if it's physically possible. Given that the observable universe is finite and the information capacity of a finite space is also finite, then anything humans can do can theoretically be encoded with a lookup table, but that doesn't mean that human thought can actually be replicated with a lookup table, since the table would be vastly larger than the observable universe can store.

LLMs look like the sort of thing that could replicate human thought in theory (since they are capable of arbitrary computation if you give them access to infinite memory) but not the sort of thing that could do it in a physically possible way.

vidarh 331 days ago

Unless humans exceed the Turing computable, the human brain is the existence proof that a sufficiently complex Turing machine can be made to replicate human thought in a compact space.

That encoding a naive/basic UTM in an LLM would potentially be impractical is largely irrelevant in that case, because for any UTM you can "compress" the program by increasing the number of states or symbols, and effectively "embedding" the steps required to implement a more compact representation in the machine itself.

While it is possible using current LLM architectures might make encoding a model that can be efficient enough to be physically practical impossible, there's no reasonable basis for assuming this approach can not translate.

marcellus23 332 days ago

I don't think you'll get into hot water for that. Anthropomorphizing LLMs is an easy way to describe and think about them, but anyone serious about using LLMs for productivity is aware they don't actually think like people, and run into exactly the sort of things you're describing.

MattSayar 332 days ago

I just wrote a post on my site where the LLM had trouble with 1) clicking a button, 2) taking a screenshot, 3) repeat. The non-deterministic nature of LLMs is both a feature and a bug. That said, read/correct can sometimes be a preferable workflow to create/debug, especially if you don't know where to start with creating.

nemomarx 332 days ago

I think it's basically equivalent to giving that prompt to a low paid contractor coder and hoping their solution works out. At least the turnaround time is faster?

But normally you would want a more hands on back and forth to ensure the requirements actually capture everything, validation and etc that the results are good, layers of reviews right

samtp 332 days ago

It seems to be a mix between hiring an offshore/low level contractor and playing a slot machine. And by that I mean at least with the contractor you can pretty quickly understand their limitations and see a pattern in the mistakes they make. While an LLM is obviously faster, the mistakes are seemingly random so you have to examine the result much more than you would with a contractor (if you are working on something that needs to be exact).

dingnuts 332 days ago

the slot machine is apt. insert tokens, pull lever, ALMOST get a reward. Think: I can start over, manually, or pull the lever again. Maybe I'll get a prize if I pull it again...

and of course, you pay whether the slot machine gives a prize or not. Between the slot machine psychological effect and sunk cost fallacy I have a very hard time believing the anecdotes -- and my own experiences -- with paid LLMs.

Often I say, I'd be way more willing to use and trust and pay for these things if I got my money back for output that is false.

sethops1 332 days ago

If the contractor is producing unusable code, they won't be my contractor anymore.

stpedgwdgfhgdd 332 days ago

In my experience using small steps and a lot of automated tests work very well with CC. Don’t go for these huge prompts that have a complete feature in it.

Remember the title “attention is all you need”? Well you need to pay a lot of attention to CC during these small steps and have a solid mental model of what it is building.

samtp 331 days ago

Yeah but once you break things down into small enough steps you might as well just code it yourself.

casey2 331 days ago

It's more likely that, because you didn't do the work, you haven't already justified the flaws to yourself.