| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zackangelo 361 days ago
	In your forward pass section you give a lot of emphasis to FlashAttention, but it might be worth mentioning Paged Attention as well (which was the paper written by the vLLM authors and I believe was the genesis of the project). PA-style block tables are now supported in most fused attention kernels, but vLLM originally came up with it and it's the main reason why vLLM has such high throughput!

1 comments

Thank you! We have incorporated your suggestion.