| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by robrenaud 890 days ago
	Removing the exponential allows some linear algebra based tricks. It makes the state space linear. Linearity allows a kind of running sum, where the state space at time T is quickly computable from the state space at time T-1. That linearity model simplification has model expressiveness costs, which is why they don't fit the training data as well.

1 comments

Wonder if it'd ve possible to have our cake and eat it too by treating layer outputs as log(state space) in that case?