| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rryan 1402 days ago

You're using a pointwise loss, which means you are treating each timepoint as conditionally independent. That's a deeply flawed assumption, probabilistically.

Try an autoregressive model of the joint probability distribution of the sine wave timepoints -- like WaveNet [1]. It will nail your sine wave -- just as it nails mixtures of sine waves (speech and music). :)

[1] https://arxiv.org/abs/1609.03499

1 comments

lostmsu 1402 days ago

I am sure a LSTM or even a small 2 layer network can learn to predict the next numbers on the clock from the numbers on the clock exactly one second ago, but that is not the point. Given such a network it will take you forever to make a prediction for second number N=10^315, yet it takes rather short time to divide with modulo.

link

azalemeth 1402 days ago

I guess a broader point is that while the universal approximation theorem is a get-out-of-jail-free card for Neural Nets, nobody said "how quickly". A bit like trying to approximate sin(x) with a Taylor series for large x is suboptimal – Chebyshev polynomials are better (and Padé approximants probably better still), but both "work". Unless you have data points on the boundary of the domain, or some analytic knowledge that the solution space is bounded somehow, expect weird shit...

link

lostmsu 1402 days ago

The remark about the Chebyshev polynomials just sent me down another rabbit hole.

link