Hacker News new | ask | show | jobs
by shreyansh26 127 days ago
What’s covered:

1. Hierarchical scans: block-local scan → write block totals → scan totals → carry-in add 2. Single-pass scans: the "domino" idea, and why naive inter-block propagation can stall / deadlock without the right coordination 3. Decoupled lookbacks: how modern single-pass scans coordinate across blocks safely 4. Warp-window lookback optimization: scanning lookback metadata in warp-sized chunks (and why it helps)

I also include H100 timings and compare against CUB for context.