Hacker News new | ask | show | jobs
by nothing0001 1211 days ago
Reading the paper, I was thinking about the following: Given the weights of two models w1 and w2, then at each neuron k compute some average of the absolute difference of the outputs of neuron k over the training set. Then perhaps the neurons with low differences are those that capture general knowledge shared by w1 and w2, just an idea.
1 comments

Two problems:

(1) in different models even of the same exact type and training approach, neuron number k will typically have completely different roles between the models. This is because the neural net is built from layers of neurons where permuting the order of neurons in a layer gives an exactly equivalent network. Because of randomization in initialization and training, any permutation for the resulting weights is equally likely to be produced. So at the least you'd have to look for corresponding neurons k1 in net 1 vs k2 in net 2.

(2) most properties you might try to look for will be not correspond to a single neuron but rather as a relationship between many different neurons. And since neural nets are so flexible there are many ways to encode approximately the same function, there is no reason to expect for your chosen "general knowledge" item net 1 and net 2 will use the same number of neurons or even same general approach to encode that.

They started with a pre trained model so symmetries were already broken, so despite what randomness there is in the fine-tuning process it may have well followed the same groove every time in training.
I think you are absolutely right, but those sames problems apply when the paper claim that the average of two models gives a good model. So in that case the weight space could have additional properties that could make the proposed approach a little more plausible with some modifications. As you suggest, features are encoded in many different ways and in many neurons, so the suggested approach could only be applied for features that are encoded using only one neuron. To reduce a little the ways the features can be encoded, the proposal could be applied to an encoding of both models. Looking for matching neurons using as distance the L1-norm of the difference of outputs.
>Because of randomization in initialization and training

Randomization in initialization seems like a pragmatic thing to do from a 'make the math work' perspective, but a really counter-intuitive thing to do when comparing the training process of our own wetware substrate. I know it's not a fair comparison, but just an interesting thought to me.