Hacker News new | ask | show | jobs
by quadcore 1511 days ago
Pretty sure overdraw / fillrate bottlenecks before vertex processing. Also you could draw that quad using strips which would then amount for only one more processed vertex compared to triangle.

Edit: okay surely with modern architecture there is no pixel write because of some early alpha cut but you still have to fetch the texture to make it so texture fetch (memory) will bottleneck first. I guess.

2 comments

You shouldn't use strips, they're slower than triangle lists on most GPUs.

If by alpha cut you mean "discard", that's going to be much slower than two triangles. Two triangles will have a tiny bit of quad overshading on the seam, compared to a full extra triangle's worth in the alpha cut case.

Yeah discard use to be slow because it flushes pipelines or mess with branching predictions I don't remember which, I just assumed they "fixed" that by now.
No, it's not either of those, it's just launching useless threads, plus all the down-stream effects of launching useless threads, e.g. if you have blending on, that will block the ROP unit which needs to wait for the threads for a given pixel in-order. If you have depth write on, that will move the write to late-Z.

More vertices is not a big problem, doubling your vertex count is not a big deal, since most GPUs process vertices in groups of 32 or more, and whether multiple instances get packed in the same group depends on the GPU vendor.

By this argument you should higher performance from higher-poly models... which clearly isn't the case?
Oh, let me clear that for you. The trick discussed here is that you can draw a sprite (a quad) using one large triangle. The sprite is just inside it but the triangle has quite some "wasted" surface.