| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cold_harbor 30 days ago
	for LLM work, reading the Flash Attention and vLLM kernel source taught me more than any book. real code makes memory hierarchy concrete — books stay too abstract.

1 comments

dandanua 29 days ago

The story of Flash Attention is the best manifestation of power and difficulty of GPU programming. This page gives a nice overview of it https://aiwiki.ai/wiki/flash_attention

link