Hacker News new | ask | show | jobs
by adrusi 1404 days ago

    ~2 msec (mouse)
    8 msec (average time we wait for the input to be processed by the game)
    16.6 (game simulation)
    16.6 (rendering code)
    16.6 (GPU is rendering the previous frame, current frame is cached)
    16.6 (GPU rendering)
    8 (average for missing the vsync)
    16.6 (frame caching inside of the display)
    16.6 (redrawing the frame)
    5 (pixel switching)
I'm not very familiar with graphics pipelines, but some stuff here seems wrong. If a game is rendering at 60fps, the combined compute time for simulation+rendering should be 16.6 ms. You can't start simulating the next tick while rendering the previous tick unless you try to do some kind of copy-on-write memory management for the entire game state. And with double buffering, the GPU should be writing frame n to the display cable at the same time as it's computing frame n+1., and the display writing the frame to its cache buffer should be happening at the same time as the GPU writes the frame to the cable.

By my count that's a whole 50 ms that shouldn't be there.

From the linked article:

One thread is calculating the physics and logic for frame N while another thread is generating rendering commands based on the simulation results of frame N-1.

Maybe modern games do use CoW memory?

[The GPU] might collect all drawing commands for the whole frame and not start to render anything until all commands are present.

It might, but is this typical behavior? This implies that the GPU would just sit idle if it finished rendering a frame before the CPU finished sending commands to draw the next one — why would it do that?

Most monitors wait until a new frame was completely transferred before they start to display it adding another frame of latency.

Maybe this is what is meant by the "16.6 (frame caching inside of the display)" item? That might be real then.

7 comments

John Carmack famously said:

“I can send an IP packet to Europe faster than I can send a pixel to the screen. How f’d up is that?”

https://mobile.twitter.com/id_aa_carmack/status/193480622533...

Games generally don't use copy on write, but they do often explicitly pipeline processing to happen across multiple frames (usually by manually copying the necessary data from sim "owned" memory to render "owned" memory, but varying amounts of double buffering is also used). This was especially true after the transition to multi-core but before the many-core regime of today. Transitioning from a single threaded engine, it was easier to run effectively a single-threaded simulation frame and a single-threaded render frame in parallel than to fully multithread everything. Graphics APIs took a while to support multithreading, as well.

These days game programmers have gotten experienced enough to get closer to fully saturating all cores in both the simulation and render steps, so you sometimes no longer see the two full frames of latency there.

> 16.6 (GPU is rendering the previous frame, current frame is cached)

Not entirely sure what this is about. Maybe some sort of triple buffering is being employed as a way to reduce hitches? If you push the engine really close to the 16 ms limit for each stage of your pipeline, sometimes something out of you control, like the OS deciding to do some heavy background work, will push you over your limit. Without the extra buffer, you will miss your vsync and the user will perceive a very disturbing judder.

I agree that it seems like the "game" part of the latency has about 33ms extra, but the source of this breakdown[0] seems to be knowledgeable and includes measurements that corroborate many of the claims. I was surprised, for example, that vsync seemed to add 2 frames of latency rather than 1 in this test.

The total time in this breakdown is in line with the measured total time, so if the source is wrong about the game by claiming it takes longer than it does, they're also claiming that some other stages take less time than they do by basically the same amount. I would bet on the monitor, but I don't have much reason to think they're wrong to begin with.

[0]: http://renderingpipeline.com/2013/09/measuring-input-latency...

> If a game is rendering at 60fps, the combined compute time for simulation+rendering should be 16.6 ms.

It can work this way--e.g. nvidia exposes an 'ultra low latency mode' in their driver that caps prerendered frames to zero--but typically for smoother animation and higher average fps gpus will have a queue of several frames that they're working on, and this is irrespective of how many render targets you have in your swapchain. Danluu's breakdown above is actually correct for the typical case.

---

Thought I'd clarify how this works since there's lots of confusion in this thread. In the early days you would directly write pixels to memory and they'd be picked up by a RAMDAC and beamed out to the screen. So if you wanted to invert the color of the bottom right pixel it would take at most two frames or 33ms of latency if you were running at 60fps double buffered: first you set your pixel in the back buffer, wait up to 16.66ms to finish drawing the current front buffer, flip buffers, wait 16.65ms for the electron gun to make its way down to the bottom right corner, and then finally draw the inverted pixel.

With modern gpu's, the situation is very similar to sending commands to a networked computer somewhere far away. You have a bit of two-way negotiation at the beginning to allocate gpu memory, upload textures/geometry/shaders, etc., and then you have a mostly one-way stream of commands. The gpu driver can queue these commands to an arbitrary depth, regardless of your vsync settings, double/triple buffering, etc, and is actually free to work on things out of order. You have to explicitly mark dependencies and a 'present' call isn't intrinsically tied to when that buffer will actually end up displayed on screen. So there's no actual upper bound on latency here; even at 360hz if the gpu is perpetually 10 frames behind the cpu, each frame only takes 2.77ms to simulate and 2.77ms to render but the overall input lag could still be ~30ms. (In practice though, drivers will typically only render 2-3 frames ahead.)

Yeah I'm with you .. even with double buffering these numbers don't hold water
I don't know too much about the whole graphics pipeline, but this is definitely double-counting.

I will say though, whatever the numbers are, after running on a 144hz monitor with adaptive sync, 60 FPS feels painfully jerky for gaming.

Input + display lag is a decent chunk of that, at least.