|
|
|
|
|
by 988747
1729 days ago
|
|
> In practice, these huge models are, in laymans terms, fucking awesome and work really well e.g. they generalize and work in production. No one understands why. How about the resulting weights? If most of them are close to 0, then that would mean that a part of the training is for NN to learn which of 1.5B parameters are relevant, and which are not. |
|
Maybe true but even then only part of the story, kernels in CNN genuinely seem to learn features like edges and textures.