|
|
|
|
|
by oli5679
207 days ago
|
|
This ties directly into the superposition theory. It is believed dense models cram many features into shared weights, making circuits hard to interpret. Sparsity reduces that pressure by giving features more isolated space, so individual neurons are more likely to represent a single, interpretable concept. |
|
https://transformer-circuits.pub/2025/attribution-graphs/met...