Hacker News new | ask | show | jobs
by rryan 1402 days ago
You're using a pointwise loss, which means you are treating each timepoint as conditionally independent. That's a deeply flawed assumption, probabilistically.

Try an autoregressive model of the joint probability distribution of the sine wave timepoints -- like WaveNet [1]. It will nail your sine wave -- just as it nails mixtures of sine waves (speech and music). :)

[1] https://arxiv.org/abs/1609.03499

1 comments

I am sure a LSTM or even a small 2 layer network can learn to predict the next numbers on the clock from the numbers on the clock exactly one second ago, but that is not the point. Given such a network it will take you forever to make a prediction for second number N=10^315, yet it takes rather short time to divide with modulo.
I guess a broader point is that while the universal approximation theorem is a get-out-of-jail-free card for Neural Nets, nobody said "how quickly". A bit like trying to approximate sin(x) with a Taylor series for large x is suboptimal – Chebyshev polynomials are better (and Padé approximants probably better still), but both "work". Unless you have data points on the boundary of the domain, or some analytic knowledge that the solution space is bounded somehow, expect weird shit...
The remark about the Chebyshev polynomials just sent me down another rabbit hole.