|
|
|
|
|
by gormen
113 days ago
|
|
Most interpretability methods fail for LLMs because they try to explain outputs without modeling the intent, constraints, or internal structure that produced them.
Token‑level attribution is useful, but without a framework for how the model reasons, you’re still explaining shadows on the wall. |
|