Hacker News new | ask | show | jobs
New deepseek paper: Natively Trainable Sparse Attention mechanism (twitter.com)
5 points by redlock 482 days ago
1 comments

Authored and Uploaded by none others than Liang Wenfeng himself