New deepseek paper: Natively Trainable Sparse Attention mechanism

Y	Hacker News new \| ask \| show \| jobs

	New deepseek paper: Natively Trainable Sparse Attention mechanism (twitter.com)
	5 points by redlock 529 days ago

1 comments

Authored and Uploaded by none others than Liang Wenfeng himself