Hacker News new | ask | show | jobs
DashAttention: Differentiable and Adaptable Sparse Hierarchical Attention (arxiv.org)
9 points by cmogni1 33 days ago