Hacker News new | ask | show | jobs
by jsenn 551 days ago
> Those layers have different representation space.

Do they? Interpretability techniques like the Logit Lens [1] wouldn't work if this were the case. That author found that at least for GPT-2, the network almost immediately transforms its hidden state into a "logitable" form: you can unproject the hidden state of any layer to see how that layer incrementally refines the next token prediction.

[1]: https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreti...