| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cbolton 448 days ago

I think that's a very unfair take. As a summary for non-experts I found it did a great job of explaining how by analyzing activated features in the model, you can get an idea of what it's doing to produce the answer. And also how by intervening to change these activations manually you can test hypotheses about causality.

It sounds like you don't like anthropomorphism. I can relate, but I don't get where Its a bit like there is the great and powerful man behind the curtain, lets trace the thought of this immaculate being you mere mortals is coming from. In most cases the anthropomorphisms are just the standard way to convey the idea briefly. Even then I liked how they sometimes used scare quotes as in it began "thinking" of potential on-topic words. There are some more debatable anthropomorphisms such as "in its head" where they use scare quotes systematically.

Also given that they took inspiration from neuroscience to develop a technique that appears successful in analyzing their model, I think they deserve some leeway on the anthropomorphism front. Or at least on the "biological metaphors" front which is maybe not really the same thing.

I used to think biological metaphors for LLMs were misleading, but I'm actually revising this opinion now. I mean I still think the past metaphors I've seen were misleading, but here, seeing the activation pathways they were able to identify, including the inhibitory circuits, and knowing a bit about similar structures in the brain I find the metaphor appropriate.