Hacker News new | ask | show | jobs
by cold_harbor 30 days ago
for LLM work, reading the Flash Attention and vLLM kernel source taught me more than any book. real code makes memory hierarchy concrete — books stay too abstract.
1 comments

The story of Flash Attention is the best manifestation of power and difficulty of GPU programming. This page gives a nice overview of it https://aiwiki.ai/wiki/flash_attention