Hacker News new | ask | show | jobs
by chrisseaton 1511 days ago
> Isn't this akin to 400k triangles on a GPU?

Is it faster to render two triangles with slightly less area, or one triangle with slightly more area, to draw the same sprite?

3 comments

Rendering only one large triangle can be faster than two. First one triangle needs less memory, less vertex processing, etc.

Second, modern GPUs render pixels in groups of 2x2 up to 8x8 "tiles". If only one pixel from this group is part of a triangle, the entire group will be rendered. When two triangles form a quad, the entire area along the diagonal "seam" will be rendered twice. The smaller quads you have, the more overhead.

Also see https://www.saschawillems.de/blog/2016/08/13/vulkan-tutorial...

I disagree, with the exception of the case you link to where half the pixels are outside the viewport or maybe where a sufficient percentage are outside the viewport.

> When two triangles form a quad, the entire area along the diagonal "seam" will be rendered twice

This may be true, but I'm pretty sure that this is more than made up for by the additional pixels in the single triangle circumscribing the quad. In fact, I'm willing to bet that it's a mathematical certainty for any rectangle, although I didn't do enough of the math to prove it.

Instead, I would say that most rendering, especially of hundreds of thousands of 2D shapes, are going to be pixel limited. So trading pixels for vertices is a poor trade.

It depends on the size of the sprites in this case. Small sprites will benefit from being drawn as single triangles.

These "shadow" pixel shader invocations are a very real pain when it comes to rendering highly detailed models. The hardware rasterization pipeline can't cope well with huge amounts of really tiny triangles. That's the reason why UE5 Nanite uses a software GPU rasterizer for the high geometry density sections of a model - it's faster! Large area primitives will be rendered normally AFAIK.

Pretty sure overdraw / fillrate bottlenecks before vertex processing. Also you could draw that quad using strips which would then amount for only one more processed vertex compared to triangle.

Edit: okay surely with modern architecture there is no pixel write because of some early alpha cut but you still have to fetch the texture to make it so texture fetch (memory) will bottleneck first. I guess.

You shouldn't use strips, they're slower than triangle lists on most GPUs.

If by alpha cut you mean "discard", that's going to be much slower than two triangles. Two triangles will have a tiny bit of quad overshading on the seam, compared to a full extra triangle's worth in the alpha cut case.

Yeah discard use to be slow because it flushes pipelines or mess with branching predictions I don't remember which, I just assumed they "fixed" that by now.
No, it's not either of those, it's just launching useless threads, plus all the down-stream effects of launching useless threads, e.g. if you have blending on, that will block the ROP unit which needs to wait for the threads for a given pixel in-order. If you have depth write on, that will move the write to late-Z.

More vertices is not a big problem, doubling your vertex count is not a big deal, since most GPUs process vertices in groups of 32 or more, and whether multiple instances get packed in the same group depends on the GPU vendor.

By this argument you should higher performance from higher-poly models... which clearly isn't the case?
Oh, let me clear that for you. The trick discussed here is that you can draw a sprite (a quad) using one large triangle. The sprite is just inside it but the triangle has quite some "wasted" surface.
Honestly I'm not sure.

I don't think that at 200k or 400k level will matter much. Math is probably easier on humans if you think about the sprites as rectangular (so two triangles), but you could in principle make each sprite a triangle, and texture map in a shader a rectangular area of the triangle.