| It would be interesting to see how you were testing for that, because at least on AMD it's fairly certain that a single thread can be shading multiple primitives. For example, from the ISA docs [1], pixel waves are preloaded with an SGPR containing a bit mask indicating just that : > The new_prim_mask is a 15-bit mask with one bit per quad; a one in this mask indicates that this quad begins a new primitive, a zero indicates it uses the same primitive as the previous quad. The mask is 15 bits, not 16, since the first quad in a wavefront begins a new primitive and
so it is not included in the mask The mask is used by the interp instructions to load the correct interpolants from local memory. In fact, in the (older) GCN3 docs [2] there is a diagram showing the memory layout of attributes from multiple primitives for a single wavefront (page 99). That being said, of course I expect this process to be "lazy" : you would not want to buffer execution of a partially filled thread forever, so depending on the workload you might measure different things. [1] https://developer.amd.com/wp-content/resources/RDNA2_Shader_... [2] http://developer.amd.com/wordpress/media/2013/12/AMD_GCN3_In... |