Hacker News new | ask | show | jobs
by pjmlp 950 days ago
The new modern APIs are not to be understood as graphics APIs, rather as GPU APIs, thus using them directly is more akin to writing a graphics device driver than a rendering engine.

Most people are better served by using GL 4.6, DX 11 and such, or a middleware engine.

Even console vendors have multiple APIs because of that, not everyone needs all little details of the GPU.

Unity is supposed to have good DirectX 12 coming up, by the way.

"Achieving Real Time Ray Tracing on Xbox with Unity and DirectX12"

https://www.youtube.com/watch?v=giaEpbBGc6E

1 comments

> The new modern APIs are not to be understood as graphics APIs, rather as GPU APIs

I've heard this repeated a lot in internet forums, but my experience working through hundreds of pages of Frank Luna's 800+ page DX12 book before concluding it pointless for 2D was that DX12 is actually fundamentally very similar to DX11 with most of the API focused on graphics rather than general compute. Compute shaders are just one chapter (13) of Luna's book, roughly 40 of the 800+ pages. I did some of the LunarG Vulkan tutorial and browsed the Khronos ref pages and reached a similar conclusion for Vulkan. I played with CUDA a bit and that's what real GPU programming looks like, almost no mention of graphics for much of the documentation. The "hello world" program isn't drawing a triangle, it's adding two arrays. Whereas a great deal of DX12, Vulkan, etc. is all about pixel formats, pixel shading, swap chains, geometry and tessellation, blending, depth and stencil, mipmaps and cube maps, clip coords, triangle winding and culling, the perspective Z divide, viewports, indexed and instanced draw calls, ... you know, graphics. But in chapter 9 of the DX12 book, end of 9.4, Luna writes, "Texture atlases can improve performance because it can lead to drawing more geometry with one draw call." So the conclusion I reached is that DX12 doesn't offer some fancy GPU compute way of writing a GPU program that can use 1000 distinct textures to draw 1000 distinct quads using only one CPU function call to launch this GPU program.

Now I've been doing more research and there is some sort of new feature called bindless textures not covered in Luna's DX12 book that might accomplish what I want (I'm not sure), but it seems to be Win11 only, WDDM 3.0 only, shader model 6.6 only, very new cards only. With this feature I might be able to set up 1000 distinct integer ids for my 1000 distinct textures, and then, with one single CPU draw call, have those 1000 textures applied to the correct 1000 quads, with no need to pack those 1000 textures into an atlas. Doing more web searching just now, this possibly can also be done in OpenGL on cards that support NV_gpu_shader5, but only semi-recent nVidia cards might support this. (I'm finding it difficult to get quick, quality answers to these sorts of questions using either web searches or LLMs.) Anyway, a gamedev forum or DX-focused reddit might be a better place for me to ask these sort of technical questions.

If I understand correctly, what you are looking for are mesh shaders and shader work graphs, which allow one to basically do most of the compute stuff on the GPU without having the CPU steering anything, besides setting up the whole chain.

You will need DirectX 12 Ultimate or Vulkan for them.

https://developer.nvidia.com/blog/introduction-turing-mesh-s...

https://microsoft.github.io/DirectX-Specs/d3d/MeshShader.htm...

https://www.khronos.org/blog/mesh-shading-for-vulkan

https://devblogs.microsoft.com/directx/d3d12-work-graphs-pre...

https://gpuopen.com/learn/gpu-work-graphs/gpu-work-graphs-in...

https://gpuopen.com/gpu-work-graphs-in-vulkan/

So I read through the materials on mesh shaders and work graphs and looked at sample code. These won't really work (see below). As I implied previously, it's best to research/discuss these sort of matters with professional graphics programmers who have experience actually using the technologies under consideration.

So for the sake of future web searchers who discover this thread: there are only two proven ways to efficiently draw thousands of unique textures of different sizes with a single draw call that are actually used by experienced graphics programmers in production code as of 2023.

Proven method #1: Pack these thousands of textures into a texture atlas.

Proven method #2: Use bindless resources, which is still fairly bleeding edge, and will require fallback to atlases if targeting the PC instead of only high end console (Xbox Series S|X...).

Mesh shaders by themselves won't work: These have similar texture access limitations to the old geometry/tessellation stage they improve upon. A limited, fixed number of textures still must be bound before each draw call (say, 16 or 32 textures, not 1000s), unless bindless resources are used. So mesh shaders must be used with an atlas or with bindless resources.

Work graphs by themselves won't work: This feature is bleeding edge shader model 6.8 whereas bindless resources are SM 6.6. (Xbox Series X|S might top out at SM 6.7, I can't find an authoritative answer.) It looks like work graphs might only work well on nVidia GPUs and won't work well on Intel GPUs anytime soon (but, again, I'm not knowledgeable enough to say this authoritatively). Furthermore, this feature may have a hard dependency on using bindless to begin with. That is, I can't tell if one is allowed to execute a work graph that binds and unbinds individual texture resources. And if one could do such a thing, it would certainly be slower than using bindless. The cost of bindless is paid "up front" when the textures are uploaded.

Some programmers use Texture2DArray/GL_TEXTURE_2D_ARRAY as an alternative to atlases but two limitations are (1) the max array length (e.g. GL_MAX_ARRAY_TEXTURE_LAYERS) might only be 256 (e.g. for OpenGL 3.0), (2) all textures must be the same size.

Finally, for the sake of any web searcher who lands on this thread in the years to come, to pack an atlas well a good packing algorithm is needed. It's harder to pack triangles than rectangles but triangles use atlas memory more efficiently and a good triangle packing will outperform the fancy new bindless rendering. Some open source starting points for packing:

https://github.com/nothings/stb/blob/master/stb_rect_pack.h

https://github.com/ands/trianglepacker