Hacker News new | ask | show | jobs
by ur-whale 483 days ago
For those who wonder ... it's somewhat likely that MLA mean Multi-head latent attention

https://verticalserve.medium.com/group-query-attention-58283...

https://paperswithcode.com/method/multi-head-attention