Hacker News new | ask | show | jobs
by tartley 5084 days ago
I'm a big fan of unit & functional testing. One reason for not writing functional tests though is that they are way harder to write than unit tests (assuming that your unit tests are relatively straightforward because your code is well factored)

I always find starting functional tests, especially in a new area of your program, to be a tremendous exercise. Maybe I'm doing it wrong. For example, last week we moved on to writing code that hits an sftp server - so our functional test spins up an sftp server locally & overrides the product code settings so that it connects to that. Getting this set up and working took me a couple of days, and this is such a tiny aspect of the code under test that I'm afraid I'm exhausting my teammates patience for "test first functional tests".

In other areas e.g. hobbyist opengl code, I think this initial barrier to getting started is so high (you'd have to write a sort of machine vision thing to analyse the actual output, rather than just what opengl calls were made) that truly end-to-end tests are a complete nonstarter. You have to give in and settle for integration tests instead.

I suspect many people are in that same place with regard to functional testing of web apps. Yes there are tools like selenium, but there are issues of test data & making sure product code connects to test databases. There are solveable, but people don't see that the value is greater than the costs of wrestling with that.

2 comments

"a sort of machine vision thing" -- You could use Sikuli or SimpleCV. I'm using OpenCV in my video game playing robot to find elements on the screen. (Of course, it's a bit of a Frankenstein to create and run.) At least it's good to know it's possible, though. But, yes, I can see why others would not want to go through this much work -- it is hard.
Hey. Out of interest, Jason, would you personally write a functional test that spun up a local SFTP server like I describe? As a developer who's better than average, it presumably wouldn't take you a couple of days to get that working, but even so, would you consider cheating a little and mocking out the sftp client calls? If so, would you still call it a 'functional test'?
for opengl testing:

If you just want to make sure that your changes aren't breaking something that is already working, you could have your code automatically create an image and then compare that image file with what you have verified as correct.

This assumes that the output for a scenario is deterministic and it requires that you hand-inspect new images if code is changed that will affect the output.

Hey. It's a nice idea, but there are problems with this. The actual output varies depending on your hardware & driver combo, and is substantially affected by the state of the graphics drivers (e.g. What types of anti-aliasing or interpolation are enabled? What color profiles are loaded?) The RGB pixel values on one machine will not match those on another machine.

You could imagine writing an 'is image almost equal' comparison, but I'm informed by those who have tried (pyglet developers) that this is substantially harder than it sounds - the differences between images are not what you would expect.

The alternative, if you want anyone else to be able to run your tests, is to tie yourself to a particular OS/hardware/driver combo. Not appealing for many projects.

Even if this could be done, this sort of 'compare snapshot' test is brittle, because, of course, we're talking about high level functional tests here, so you'd be snapshotting your whole game/application, not just limited aspects of it in a limited environment. Hence the screenshots would change all the time. Every time you added or modified any functionality you'd get a failing test and have to manually compare the images and assert that the differences were OK and then commit the new screenshot. This is ripe for overlooking small regressions, and makes subsequent bisection very difficult.

Of course, we haven't even got into the aspect that, as an end-to-end test, your test code would actually have to interpret the images and send mouse/key inputs to successfully play your game. Through to completion, of course - how else would you know your game-completion conditions were all wired up correctly?

I agree that there are generally problems doing this; we had thought a bit about doing this at a previous job when testing an AfterEffects plug-in we were developing, but we didn't actually do it.

Just wanted to add that one technique that could allow this to work better when testing across different kinds of hardware / driver settings would be to share high-level results (i.e. for release 1200, these images seem okay) rather than actual images among testers. (So each tester would generate its own "correct" images.) Yes, it is possible that some of the images this other tester assumes are correct aren't actually correct on their machine due to the hardware configuration. But if you care about this, you would not be able to share test results, anyway. And you could still test actual rendering on different kinds of hardware in a separate pass.