|
|
|
|
|
by wwarner
758 days ago
|
|
I suppose, except that for a model of 7B parameters, the number of combinations of dropout that you'd be analyzing is 7B factorial. More importantly, dropout has loss minimization to guide it during training, whereas understanding how a model changes when you edit a few weights is a very broad question. |
|
When you look at a specific input, you can look to see what gets activated or not. Orthogonal but related ideas for inspecting the activations to see effects