|
|
|
|
|
by jbay808
1116 days ago
|
|
I would interpret this as showing that matrix multiplication code is carefully engineered to correctly implement... well, matrix multiplication. Stumbling on that specific mapping of 96 input bits to 96 output bits would be hard to pick out of a hat by chance, from the set of all possible mappings. Learning that precise mapping, starting from a uniform prior and only given a finite set of examples, could be seen as an impressive task, although less impressive than sorting. If a model learns the correct mapping -- and better yet, needs only 9 parameters to implement it -- then I think it's fairer to say the model does matrix multiplication, rather than that the model convincingly imitates the statistics of matrix multiplication. |
|