Hacker News new | ask | show | jobs
by nothing0001 1211 days ago
I think you are absolutely right, but those sames problems apply when the paper claim that the average of two models gives a good model. So in that case the weight space could have additional properties that could make the proposed approach a little more plausible with some modifications. As you suggest, features are encoded in many different ways and in many neurons, so the suggested approach could only be applied for features that are encoded using only one neuron. To reduce a little the ways the features can be encoded, the proposal could be applied to an encoding of both models. Looking for matching neurons using as distance the L1-norm of the difference of outputs.