|
|
|
|
|
by jebarker
1036 days ago
|
|
When I looked into this briefly my impression was that it's extremely hard to do mechanistic interpretation beyond very simple cases like CNN classification or toy problems like arithmetic in transformers. Not to say it's not a worthy pursuit, but I think the difficulty isn't justified for many researchers since the results won't make a big splash like a new model training result. |
|
I don't know if that is the direction, but just an example that comes to mind easily.
If someone figures out how to do this, I think their models will be far more capable and reliable.