|
|
|
|
|
by shreyansh26
127 days ago
|
|
What’s covered: 1. Hierarchical scans: block-local scan → write block totals → scan totals → carry-in add
2. Single-pass scans: the "domino" idea, and why naive inter-block propagation can stall / deadlock without the right coordination
3. Decoupled lookbacks: how modern single-pass scans coordinate across blocks safely
4. Warp-window lookback optimization: scanning lookback metadata in warp-sized chunks (and why it helps) I also include H100 timings and compare against CUB for context. |
|