| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by SushiHippie 746 days ago

This seems to be what Anthropic and OpenAI did in their research

Golden Gate Claude - https://news.ycombinator.com/item?id=40459543 - (60 comments, 16 days ago)

Extracting Concepts from GPT-4 - https://news.ycombinator.com/item?id=40599749 (144 comments, 2 days ago)

2 comments

lakshith-403 746 days ago

Interesting. I think OpenAI here uses sparse autoencoders to map out sparse activation patterns in networks. Comparing them to how a real person reasons about a situations.

Inspectus, on the other hand is a general tool to visualize how transformer models pay attention to different parts of the data they process.

link

dimatura 746 days ago

That OpenAI work is more elaborate. It trains an additional network in such a way that it encodes what GPT is doing in terms of activations, but in a more interpretable way (hopefully). Here, as far as I can tell, it's visualizing the activation of the attention layers directly.

link