|
|
|
|
|
by charcircuit
1037 days ago
|
|
>Vague and fairly useless. When creating a model your goal is to find one with minimal loss. Being able to figure how to improve a model by finding weights that reduce the loss is not a vague or useless idea. >What is it doing to minimize loss? The value helps us get to a location in the parameter space with lower loss. >Only weights with values close to or at zero get pruned. Weights near 0 don't change the results of the calculations they are used in my much which is why they don't effect loss very much. |
|
I'm sorry but did you bother reading the previous conversation ? We were talking about how much we know what weights do during inference. "It reduces loss" alone is in fact very vague and useless for interpretability.
>The value helps us get to a location in the parameter space with lower loss.
What neuron(s) is responsible for capitalization in GPT? You wouldn't get that simply from "reduces the loss". Our understanding of what the neurons do is very limited.
>Weights near 0 don't change the results of the calculations they are used in my much which is why they don't effect loss very much.
I understand that lol.
"This value is literally 0 so it can't affect things much" is a very different understanding level from "this bunch of weights are a redundancy because this set already achieves this function that this other set does and so can be pruned. Let's also tune this set so it never tries to call this other set while we're at it. "