|
|
|
|
|
by mr_octopus
122 days ago
|
|
Great writeup. The three-pass softmax is where everyone
gets stuck — subtracting the max for numerical stability
is one of those things you can't learn from a textbook. The pain points you hit (byte alignment, dispatch dims,
strict typing) make me wonder if there's a sweet spot
between raw WGSL and "import pytorch" that keeps you
close to the metal without all the papercuts. |
|