|
|
|
|
|
by whimsicalism
1543 days ago
|
|
It would be trivial for a network of this size to code general rules for multiplication. At a certain point, when you have enough data, finding the actual rule is actually the easier solution than memorizing each data point. This is the key insight of deep learning. |
|
More fundamentally, any finite neural net is either constant or linear outside the training sample,depending on the activation function. Unless you design special neurons like in the paper above, which solves this specific problem for arithmetic, but not the general problem of extrapolation.