| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by iknownothow 384 days ago
	> Log-linear attention replaces the fixed-size hidden state with a logarithmically growing set of hidden states Does this mean the models can be smaller too (on top of the primary benefit of being faster)?

1 comments

Lerc 384 days ago

Reduced memory consumption for context perhaps, but hidden state is different from weights. I don't think this would improve the model's capability per model parameter (but as with everything with ML, I wouldn't bet against it until it's been tested)

link