| That's mostly correct. It is as you say, except that shading and texturing come for free. You may be thinking of Playstation where you do indeed get decreased fillrate when texturing is on. Now, if you enable 2cycle mode, the pipeline will recycle the pixel value back into the pipeline for a second stage, which is used for 2 texture lookups per pixel and some other blending options. Otherwise, the RDP is always outputting 1 pixel per clock at 62.5 mhz. (Though it will be frequently interrupted because of ram contention) There are faster drawing modes but they are for drawing rectangles, not triangles. It's been a long time since I've done benchmarks on the pipeline though. You're exactly right that the UMA plus high latency murders it. It really does. Enable zbuffer? Now the poor RDP is thrashing read modify writes and you only get 8 pixel chunks at a time. Span caching is minimal. Simply using zbuf will torpedo your effective full rate by 20 to 40 percent. That's why stuff I wrote for it avoided using the zbuffer whenever possible. The other bandwidth hog was enable anti aliasing. AA processing happened in 2 places: first in the triangle drawing pipeline, for inside polygon edges. Secondly, in the VI when the framebuffer gets displayed, it will apply smoothing to the exterior polygon edges based on coverage information stored in the pixels extra bits. On average, you get a roughly 15 to 20 percent fillrate boost by turning both those off. If you run only at lowres, it's a bit less since more of your tender time is occupied by triangle setup. |
Another example from that video was changing a trig function from a lookup table to an evaluated approximation improved performance because it uses less memory bandwidth.
Was the zbuffer in main memory? Ooof
What's interesting to me is that even Kaze's optimized stuff is around 8k triangles per frame at 30fps. The "accurate" microcode Nintendo shipped claimed about 100k triangles per second. Was that ever achieved, even in a tech demo?