Hacker News new | ask | show | jobs
Gated Attention for Large Language Models (arxiv.org)
1 points by xnhbx 197 days ago