|
|
|
|
|
by andreyk
737 days ago
|
|
Exciting to see this so soon after Anthropic's "Mapping the Mind of a Large Language Model" (under 3 weeks). I find these efforts really exciting; it is still common to hear people say "we have no idea how LLMs / Deep Learning works", but that is really a gross generalization as stuff like this shows. Wonder if this was a bit rushed out in response to Anthropic's release (as well as the departure of Jan Leike from OpenAI)... the paper link doesn't even go to Arxiv, and the analysis is not nearly as deep. Though who knows, might be unrelated. |
|
"We currently don't understand how to make sense of the neural activity within language models."
"Unlike with most human creations, we don’t really understand the inner workings of neural networks."
"The [..] networks are not well understood and cannot be easily decomposed into identifiable parts"
"[..] the neural activations inside a language model activate with unpredictable patterns, seemingly representing many concepts simultaneously"
"Learning a large number of sparse features is challenging, and past work has not been shown to scale well."
etc., etc., etc.
People say we don't (currently) know why they output what they output, because .. as the article clearly states, we don't.