|
|
|
|
|
by bmh
968 days ago
|
|
It's interesting that the standard "K" (number of elements with a shared scale) is 32. That seems to imply that the neural network will somehow learn to group weights at those 32-element boundaries.
Does anybody understand how that works? I mean, what is the mechanism that naturally causes the model to group weight scales into those K-element clusters? |
|