|
|
|
|
|
by drdeca
1514 days ago
|
|
Well, the main issue I see is where to put the n^2 memory (where n is the number of digits) when doing multiplication. (Or, doesn’t need n^2 space, could do it in less, but might need to put more structure into the architecture?) If the weights are designed, and the network architecture allows something to hold the information needed, then there is really no obstacle to having it get multiplication entirely (not just 90%). Now, would that be learnable? I’m not so sure, at least with the architecture one would use if designing the weights. But, I see no reason a transformer model couldn’t be trained on multiplication-with-work-shown and produce text fitting all of those patterns, and successfully perform multiplication for many digits that way. And, by “showing all work” I don’t necessarily mean “in a way a person would typically show their work”, but in a easier-for-machine way. |
|