Hacker News new | ask | show | jobs
by flohofwoe 2040 days ago
The Source engine hails from the 90s, testing hadn't been invented back then ;)

But on a more serious note, writing automated tests for game engines involves a lot more than just "duh, unit tests" (especially when testability wasn't a concern in the original design).

3 comments

Also in this case talking about the graphics stack it's not like OpenGL or DirectX are setup to be testable. You can't really "unit test" shader code, the best you can do is render it in some test scenes and screenshot the results. Which ends up flaky & noisy due to valid-per-spec differences in GPU & driver behaviors.
Every language has valid-per-spec differences. That's exactly why you test.

For sure OpenGL/DX requires more infrastructure to run unit tests than a generic block of C code. But it's absolutely possible to "unit test" shader code, with buffer read-back and/or vertex stream out, among other options. It's more the game engines themselves that aren't setup for unit tests rather than the graphics stack

> For sure OpenGL/DX requires more infrastructure to run unit tests than a generic block of C code. But it's absolutely possible to "unit test" shader code, with buffer read-back and/or vertex stream out, among other options.

Which is what I said, you can screenshot & compare. But it becomes a fuzzy compare due to acceptable precision differences.

And it ends up being more of an integration test and not a unit test.

> Every language has valid-per-spec differences.

They really don't, but that's not entirely what I'm talking about. I'm talking about valid hardware behavior differences, which doesn't exist broadly. How a float performs in Java is well-defined and never changes. How numbers perform in most languages is well-defined and does not vary.

GPU shaders are completely different. Numbers do not have consistent behavior across differing hardware & drivers. This is a highly unique situation. Even in languages where things are claimed to be variable (like the size of int in C & C++), end up not actually varying, because things don't cope well with it. Shaders don't play any such similar games.

> Which is what I said, you can screenshot & compare. But it becomes a fuzzy compare due to acceptable precision differences

You make it sound rigorous than it can be. A readback doesn't need to be a "screenshot" and doesn't need to be of a full scene. A frame buffer can be a 1x1 value.

Regarding precision differences, it's not much different than testing floating point math anywhere else. Shaders allow fast-math style optimizations generally, but they can be disabled at least on some platforms[1][2], otherwise one can take care in floating point math, or provide tests just using integer math.

> And it ends up being more of an integration test and not a unit test.

Sure, if you just setup scenes, render, screenshot and do a fuzzy compare, that looks more like an integration test. And I agree it's more common to see integration tests for renderers. But really, it's a bit more involved in that you have to deal with uploads, command queues, readbacks, but you really can setup the infrastructure to do proper unit tests, and then you can decide how you want to handle unit testing of flexible precision code, either toggling precision in the compilers, or building your tests to properly bound your expected precision, or both.

> GPU shaders are completely different. Numbers do not have consistent behavior across differing hardware & drivers.

This is an outdated and simply not true view, every modern (PC-Spec?) GPU hardware has IEEE754 compliant floats. They have to, otherwise GPGPU wouldn't have taken off in scientific computing. compiler defaults may just not be right.

[1] https://github.com/Microsoft/DirectXShaderCompiler/blob/mast... [2] See: #pragma optionNV(fastmath off)

Game code can be hard to unit test, it needs integration tests on actual hardware. Lots of weird stuff on all chips that need to be taken care of.
There's also no easy way to test for things like "do the shadows render correctly".

About the only thing you can do is take before/after screenshots and compute a signal-to-noise ration on a diff between the images. Which makes for an extremely fragile test definition. What if you change the default FOV of the camera? Now all your tests fail for no good reason.

What if you change the default FOV of the camera? Now all your tests fail for no good reason.

Which is completely fine, because you probably wouldn't want to accidentally change the FOV, would you?

High confidence tests fail on unexpected results. If only some aspects of the results are checked, the tests have obvious blind spots.

Yep, lots of interaction tests that can be quite brittle and can take a long time to run, even with a farm of servers and consoles.

A lot of small and low level stuff can be unit tested but during production things like writing good tests falls through the cracks.

And game development usually involves a lot of iteration, so setting up tests is at best a waste of time, and at worst a crutch that hurts productivity.
The best benefit we've found for unit tests is in low level platform specific code and generic containers. Things of that nature which absolutely must work and can themselves be tested in isolation.

I'm sure when we finish the project and look back on it we can go in, clean up, and implement far more unit tests for the code we already have.

I am surprised that game engines aren't set up to test simple scenarios programmatically. I play a lot of Overwatch and the bugs / patch notes about fixing those bugs amaze me every time; they tell me a lot about how the software is designed and tested.

There was one bug where a character has a deployable ability that doubles the damage and healing of all projectiles that pass through it. One day, the patch notes read "fixed an issue where healing was not amplified when passing through the amplification matrix". And, I totally get it... every conference talk I've seen out of Blizzard goes into details about all the infrastructure they've made for play testing their games. It sounds easy to get your coworkers into a build of your latest PR and try it out. But things like these subtle numbers adjustments just don't translate well to play testing -- sometimes the enemy is doing so much damage that you can't really be sure that the problem is the Amp Matrix isn't multiplying the healing by the right number. So, from time to time, refactors break it!

But, in a world where you could easily write integration tests, this problem would never happen. You'd write a simple scenario like "create empty room. place baptiste at position 0,0. deploy amp matrix at position 10,0 with orientation 90 degress. place sombra at position 20,0. set her health to 80. make baptiste fire a healing grenade along vector 1,0 at an angle of 45 degrees. wait 10 ticks. ensure that sombra's health is now 200." The framework to be able to write tests like this is not difficult (you can do it in their "workshop"), and it's not difficult to write a test like this for every ability, and even every combination of abilities. And, it would mean that play testers never ever need to be suspicious of numbers; the automated tests already check that. You'd make developers more productive (the computer can check the basics like this), and play testers more productive (they don't need to test simple stuff anymore). But... I don't think they do it. The buggiest releases are when the team is under time pressure to hit a deadline (Overwatch has seasonal events; the patch that introduces a seasonal event always has some weird bugs), and I don't think automated tests miss things under time pressure -- but humans sure do.

The one thing I'll give Blizzard credit for is that their games are fun. All that playtesting is certainly a good idea. I'd supplement it with some gameplay-focused integration tests, though. They have the money and the tools teams, and their games last longer than a few months, so it just seems like a smart investment to me. So it just baffles me what bugs ship to production.

The only game I'm aware of with an extensive automated testing infrastructure is Minecraft:

https://youtu.be/vXaWOJTCYNg?t=993

Riot has written about their League of Legends test infrastructure https://technology.riotgames.com/news/automated-testing-leag...
Interesting! That looks exactly like the test I wanted to write.