Hacker News new | ask | show | jobs
The tug-of-war between cache and capacity: from MHA, MQA, GQA to MLA (yuxi-liu-wired.github.io)
1 points by YuxiLiuWired 507 days ago