|
|
|
|
|
by karmakaze
397 days ago
|
|
I'm not "in the field" though I like to read about and use LLMs. This video "How DeepSeek Rewrote the Transformer [MLA]"[0] is really good at explaining MHA, MQA, GQA, and MLA with clear visuals/animations and how DeepSeek MLA is 57x more efficient. [0] https://www.youtube.com/watch?v=0VLAoVGf_74&t=960s |
|