vibe-infer: Learning GPU Programming with Claude Code

Great writeup. The three-pass softmax is where everyone gets stuck — subtracting the max for numerical stability is one of those things you can't learn from a textbook.

The pain points you hit (byte alignment, dispatch dims, strict typing) make me wonder if there's a sweet spot between raw WGSL and "import pytorch" that keeps you close to the metal without all the papercuts.