| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by shreyansh26 143 days ago
	A short deep-dive on Multi-Head Latent Attention (MLA) (from DeepSeek): intuition + math, then a walk from MHA → GQA → MQA → MLA, with PyTorch code and the fusion/absorption optimizations for KV-cache efficiency.