Hacker News new | ask | show | jobs
by FeepingCreature 176 days ago
This is true though. While we know what they do on a mechanistic level, we cannot reliably analyze why the model outputs any particular answer in functional terms without a heroic effort at the "arxiv paper" level.
1 comments

that’s true of analyzing individual atoms in a combustion engine — yet I doubt you’d claim we don’t know how they work

also this went from “we can’t analyze” to “we can’t analyze reliably [without a lot of effort]” quite quickly

In the digital world, we should be able to go back from output to input unless the intention of the function is to "not do that". Like hashing.

Llms not being able to go from output back to input deterministically and for us to understand why is very important, most of our issues with llms stem from this issue. Its why mechanistic interpretabilty research is so hot right now.

The car analogy is not good because models are digital components and a car is a real world thing. They are not comparable.

ah I forgot digital components are not real world things
I mean, fluid dynamics is an unsolved issue. But even so we know *considerably* less about how LLMs work in functional terms than about how combustion engines work.
I outright disagree; we know how LLMs work
We know how neural nets work. We don't know how a specific combination of weights in the net is capable of coherently asking questions asked in a natural language, though. If we did, we could replicate what it does without training it.
> We know how neural nets work. We don't know how a specific combination of weights in the net is capable of coherently asking questions asked in a natural language, though.

these are the same thing. the neural network is trained to predict the most likely next word (rather token, etc.) — that’s how it works. that’s it. you train a neural network on data, it learns the function you trained it to, it “acts” like the data. have you actually studied neural networks? do you know how they work? I’m confused why you and so many others are seemingly so confused by this. what fundamentally are you asking for to meet the criteria of knowing how LLMs work? some algorithm that can look at weights and predict if the net will output “coherent” text?

> If we did, we could replicate what it does without training it.

not sure what this is supposed to mean

It's like you're describing a compression program as "it takes a big file and returns a smaller file by exploiting regularities in the data." Like, you have accurately described what it does, but you have in no way answered the question of how it does that.

If you then explain the function of a CPU and how ELF binaries work (which is the equivalent of trying to answer the question by explaining how neural networks work), you then have still not answered the actually important question! Which is "what are the algorithms that LLMs have learnt that allow them to (apparently) converse and somewhat reason like humans?"