Hacker News new | ask | show | jobs
by tikkun 1120 days ago
> Are there papers that peruse what kind of concepts the model is actually building/learning in those heads and layers?

> There are large teams who spend months tuning those models. Do those teams have access to those internal concepts that the model built up and organized? Is any of this work public?

See: https://openai.com/research/language-models-can-explain-neur...

My understanding: Generally, the models are compressing their understanding of all text, and in doing so, they're learning high order concepts that allow their compression of all the text they were fed during pre-training to be a better compression - more compressed, and less loss.

1 comments

> Generally, the models are compressing their understanding of all text, and in doing so, they're learning high order concepts

Are these higher order concepts accessible to us? E.g. can we list those learned concepts?

(Re-reading the paper you linked now...)

My understanding is that the answer is generally: not yet.

(I wish, I suspect we'll be able to learn some interesting things about the universe, about humans, and so on, by seeing what LLMs found to be highly explanatory / high order concepts)