Hacker News new | ask | show | jobs
by xg15 187 days ago
> LLMs process information very differently. They look at everything in parallel, all at once, and can use the whole context in one shot. Their “memory” is stored across billions of tiny weights, and they retrieve information by matching patterns, not by searching through memories like we do. Researchers have shown that LLMs automatically learn specific little algorithms (like copying patterns or doing simple lookups), all powered by huge matrix multiplications running in parallel rather than slow, step-by-step reasoning.

I think this is incorrect on two accounts: Yes, transformers and individual layers are parallel, but the entire network is not. On a first level, it's obviously sequential over generated tokens - but even generation of a single token is sequential in the number of layers that the information travels through.

Both those constraints are comparable to the way humans think I believe. (The human brain doesn't have neatly organized layers, but it does have "pathways" where certain brain regions project into other brain regions)