|
|
|
|
|
by jsenn
551 days ago
|
|
> Those layers have different representation space. Do they? Interpretability techniques like the Logit Lens [1] wouldn't work if this were the case. That author found that at least for GPT-2, the network almost immediately transforms its hidden state into a "logitable" form: you can unproject the hidden state of any layer to see how that layer incrementally refines the next token prediction. [1]: https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreti... |
|