Hacker News new | ask | show | jobs
by barrkel 664 days ago
Why would someone expect interacting with a local LLM to teach anything about inference?

Interacting with a local LLM develops one's intuitions about how LLMs work, what they're good for (appropriately scaled to model size) and how they break, and gives you ideas about how to use them as a tool in a bigger applications without getting bogged down in API billing etc.

2 comments

Assuming s/would/wouldn't: If you are super smart then perhaps you can intuit details about how they work under the hood. Otherwise you are working with a mental model that is likely to be much more faulty than the one you would develop by learning through study.
Knowing the specific multiplies and QKV and how attention works doesn't develop your intuition for how LLMs work. Knowing that the effective output is a list of tokens with associated probabilites is of marginal use. Knowing about rotary position embeddings, temperature, batching, beam search, different techniques for preventing repetition and so on doesn't really develop intuition about behavior, but rather improve the worst cases - babbling repeating nonsense in the absolute worst - but you wouldn't know that at all from first principles without playing with the things.

The truth is that the inference implementation is more like a VM, and the interesting thing is the model, the set of learned weights. It's like a program being executed one token at a time. How that program behaves is the interesting thing. How it degrades. What circumstances it behaves really well in, and its failure modes. That's the thing where you want to be able to switch and swap a dozen models around and get a feel for things, have forking conversations, etc. It's what LM Studio is decent at.

But those things are all so cool though. Like... how could you not want to learn about them.

Seriously though, I guess I'm just kind of uncomfortable with "treating inference implementation like a VM" as you put it. It seems like a bad idea. We are turning implementation details into user interfaces in a space that is undergoing such rapid and extreme change. Like people spent a lot of time learning the stable diffusion web ui, and then flux came out and upended the whole space. But maybe foundational knowledge isn't as valuable as I'm thinking and its fine that people just re-learn whatever UIs emerge, I don't know.

You can also learn how a user will approach prompting.