| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vintagedave 119 days ago
	This is fantastic to read. LLMs feel like black boxes and for the large ones especially I have a sense they genuinely form concepts. Yet the internals were opaque. I remember reading how LLMs cannot explain their own behaviour when asked. I feel this would give insight into all that including the degree of true conceptualisation. I’m curious if this can demonstrate what else the model is aware of when answering, too.

1 comments

adebayoj 119 days ago

Our decomposition allows us to answer question like: for 84 percent of the model's representation, we know it is relying on this concept to give an answer.

We can also trace its behavior to the training data that led to it, so that can show us where some of these concepts are formed from.

link