|
|
|
|
|
by thechao
1086 days ago
|
|
Yeah — my favorite instructions he added were `fmad233` and `faddsets`; the former instruction essentially bootstraps the line-equation for the mask-generation for rasterization, and the latter lets you 'step' the intersection. You could plumb the valid mask through and get the logical intersection "for free". This let us compute the covering mask in 9 + N instructions for N+1 4x4 tiles. We optimized tile load-store to work in 16x16 chunks, so valid mask generation came to just 24 cycles. It was my argument that using Boustrophedon order and just blasting the tile (rather than quad-tree descent like he designed) is what convinced him to let me work with RAD & do the non-polygon path for LRB. |
|