Hacker News new | ask | show | jobs
by lappa 359 days ago
This isn't suggesting no one understands how these models are architected, nor is anyone saying that SDPA / matrix multiplication isn't understood by those who create these systems.

What's being said is that the result of training and the way in which information is processed in latent space is opaque.

There are strategies to dissect a models inner workings, but this is an active field of research and incomplete.

1 comments

Whatever comes out of any LLM will directly depend upon the data you fed it and which answers your reinforced as correct. There is nothing unknown or mystical about it.
The same could be said of people, revealing the emptiness of this idea. Knowing the process at a mechanism level says nothing about the outcome. Some people output German, some English. It’s sub-mechanisms are plastic and emergent