Hacker News new | ask | show | jobs
by lostmsu 1402 days ago
I am sure a LSTM or even a small 2 layer network can learn to predict the next numbers on the clock from the numbers on the clock exactly one second ago, but that is not the point. Given such a network it will take you forever to make a prediction for second number N=10^315, yet it takes rather short time to divide with modulo.
1 comments

I guess a broader point is that while the universal approximation theorem is a get-out-of-jail-free card for Neural Nets, nobody said "how quickly". A bit like trying to approximate sin(x) with a Taylor series for large x is suboptimal – Chebyshev polynomials are better (and Padé approximants probably better still), but both "work". Unless you have data points on the boundary of the domain, or some analytic knowledge that the solution space is bounded somehow, expect weird shit...
The remark about the Chebyshev polynomials just sent me down another rabbit hole.