Hacker News new | ask | show | jobs
by oflordal 1795 days ago
No, this is about HW architectures. While they are likely evolving towards one a other there are tile based (like Imagination and ARM Mali) And immediate mode (Nvidia AMD) that both implement the same APIs (OpenGL, Vulkan etc). All these HW architectures are modern and in use.
2 comments

Basically all modern GPU architectures implement tiled rasterization. NVIDIA has been doing it since Maxwell (2014) and AMD has been doing it since Vega (2017). Even Intel has been doing it for a few years now starting with their Gen 11 (2019) GPUs.
Those are going to require some serious citations. I'm quite sure most desktop GPUs don't run as tiled renderers at least under normal circumstances.
> Specifically, Maxwell and Pascal use tile-based immediate-mode rasterizers that buffer pixel output, instead of conventional full-screen immediate-mode rasterizers.

https://www.realworldtech.com/tile-based-rasterization-nvidi...

He describes it as "tile-based immediate mode" in the article and the video should go into more detail about it. It's been a while since I watched it.

The parent article already discusses that article, saying those GPUs don't use TBR in areas where the primitive count is too high or something:

> Another class of hybrid architecture is one that is often referred to as tile-based immediate-mode rendering. As dissected in this article[1], this hybrid architecture is used since NVIDIA’s Maxwell GPUs. Does that mean that this architecture is like a TBR one, or that it shares all benefits of both worlds? Well, not really…

What the article and the video fails to show is what happens when you increase the primitive count. Guillemot’s test application doesn’t support large primitive counts, but the effect is already visible if we crank up both the primitive and attribute count. After a certain threshold it can be noted that not all primitives are rasterized within a tile before the GPU starts rasterizing the next tile, thus we’re clearly not talking about a traditional TBR architecture.

[1] https://www.realworldtech.com/tile-based-rasterization-nvidi...

Classic TBDRs typically require multiple passes on tiles with large primitive counts as well. Each tile's buffer containing binned geometry generally has a max size, with multiple passes required if that buffer size is exceeded.
Yeah, please see https://news.ycombinator.com/item?id=27898421

Having watched the video, I'm fairly certain what is being observed is not really tiled.

I'm not however sure what a "tile-based immediate-mode rasterizers that buffer pixel output", but I think that's enough qualifications to make it somewhat meaningless. All modern gpu's dispatch thread groups that could look like "tiles" and have plenty of buffers, likely including buffers between fragment output, and render target output/color blending, But that doesn't make it a tiled/deferred renderer.

Section 5.2 of Intel's Gen11 architecture manual [1]

(yes, PTBR is only enabled on passes the driver thinks will benefit from it)

[1] https://software.intel.com/content/dam/develop/external/us/e...

AMD has even talked publicly about how their rasterizer can run in a TBDR mode that they call DSBR.

https://pcper.com/2017/01/amd-vega-gpu-architecture-preview-...

Interestingly, Nvidia has been using tile based rasterizers for a bit too. https://www.techpowerup.com/231129/on-nvidias-tile-based-ren...
It's been often quoted that Nvidia has switched to tile based for their Desktop renderers, but I haven't seen a source that confirms this. I suspect this is speculation due to changes in raster order that produce side-effects that look tiled even though they aren't.
This has been empirically tested on multiple occasions. There is an article on realwordtechnologies discussing this, and the results have been related for newer AMD GPUs as well. I have a little tool for macOS that tests these things out, and the Navi GPU on my MacBook is definitely a tiler (the Gen10 Intel GPU is not).
It's brought up in multiple other comments, so I won't bother going into detail, but the empirical testing, is flawed and is actually measuring changes in other details about thread launch behavior.