Hacker News new | ask | show | jobs
by hacketthfk 890 days ago
I don't understand something, why do they claim they go from O(N*N) to O(N), but all they claim they are doing is removing one exponentiation operation, which is O(1)? Where is the O(N) they are removing?
2 comments

Removing the exponential allows some linear algebra based tricks. It makes the state space linear. Linearity allows a kind of running sum, where the state space at time T is quickly computable from the state space at time T-1.

That linearity model simplification has model expressiveness costs, which is why they don't fit the training data as well.

Wonder if it'd ve possible to have our cake and eat it too by treating layer outputs as log(state space) in that case?
It's described explicitly in section 1 where they first reduce to a linear relationship and then recognize that a portion of the formula can be captured in a state variable, and rewrite as a recurrence relation.

By persisting the state variable across subsequent computations they transform the quadratic formula for computing output into a linear formula computing output and next state from current state.

It's kind of like memoization, but since it's a number it's constant space too.