Y
Hacker News
new
|
ask
|
show
|
jobs
Understanding Multi-Head Latent Attention (From DeepSeek)
(
shreyansh26.github.io
)
2 points
by
shreyansh26
143 days ago
1 comments
shreyansh26
143 days ago
A short deep-dive on Multi-Head Latent Attention (MLA) (from DeepSeek): intuition + math, then a walk from MHA → GQA → MQA → MLA, with PyTorch code and the fusion/absorption optimizations for KV-cache efficiency.
link