Hacker News new | ask | show | jobs
Kimi introduces Attention Residuals: 1.25x compute performance at <2% overhead (arxiv.org)
9 points by nekofneko 101 days ago