| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zozbot234 47 days ago
	It's not really a capability, it's more like a very costly hack and they make that very clear in the paper. Training two models (an encoder and a decoder) for the purpose of explaining a single layer at a time is not that sensible. It's neat that you can generate so much readable text about how the LLM decodes partial input, and I suppose it gives you some extra debugging ability, but that's all there is to it.

2 comments

phire 47 days ago

The NLA also hallucinates, so it's still not revealing the models actual "thoughts" of the model; The paper also points out that since the NLA is a full LLM, it can make inferences that aren't actually in the activations.

But it's a useful approximation for auditing.

link

semiquaver 46 days ago

Why does it being a “costly hack” make it “not a capability?”

Using your logic, LLMs, which are very fairly described as “costly” and “a hack” do not themselves constitute a useful capability, which I hope most people would agree is obviously false.

link