|
|
|
|
|
by menaerus
513 days ago
|
|
I skimmed through the paper real quickly. There's no performance data on inference speedups in the paper. Only the benchmarks relevant for training. They also, interestingly, don't compare against the flash-attention. Flash-attention outperforms all of the other attention mechanisms mentioned in the paper: MHA, MQA, GQA, and MLA. |
|