Hacker News new | ask | show | jobs
by dragontamer 1452 days ago
> In AI workload these days, how to schedule thousands of parallel threads(SIMD style) becomes more and more interesting, wish someone had a good write on that topic.

CPU Shaders have a very simple scheduler. Blocks are scheduled one block at a time likely in some very simple heuristic (likely from lowest index to highest index, though nominally you don't know what order they're executed in).

That's good enough in 90% of cases. The last 10% can be served by a variety of techniques, including long-running kernels, data-structure queues, dynamic parallelism, etc. etc. But mastering the basic kernel invocation in the classic "matrix multiplication" scenario (roughly lining up with OpenMP "static scheduling"), and its benefits, is a good idea.