Hacker News new | ask | show | jobs
Native Sparse Attention: Hardware-Aligned and Natively Trainable (arxiv.org)
2 points by teepo 490 days ago