Hacker News new | ask | show | jobs
by joshvm 2261 days ago
One thing that is rarely discussed (I think?) is how to test things which don't have a correct answer. It's not just "refactor until you can test" it's output that may be subjective. For example, suppose you write some code to do some image processing like a stereo matcher. How do you check your code works? Usually you have some ground truth which you can compare, but it's difficult because you'll never get 100% accuracy. At best you can declare a baseline, eg that your algo should be say 90% accurate (if you implemented it properly based on literature results) and if you don't get that, then error. In that case you can use a numerical metric, but other applications you might care about the result being aesthetically pleasing (eg a video ISP where you do colour correction on the stream coming from a low level camera).

Or hardware where the advice is usually to mock the device under test. But if you don't own the hardware the most you can do is try and emulate it, and maybe check that your simulated state machine works. In my experience its easier to run with hardware connected and just skip those tests otherwise. There are also extremely subtle bugs that can crop up with hardware interfaces like needing to insert delays into code (eg when sending serial) that will otherwise fail in the real world.

OpenCV has some interesting approaches to this, for example testing storing a video in a certain format, inserting a frame with a known shape (like a circle), then reading back the video and checking that the shape can be detected.

4 comments

I only mentioned it briefly at the end of the post, but metamorphic testing is a very interesting technique that addresses exactly this [0].

The basic idea is to start with some known-good inputs and outputs, and then generate ways to modify the input that should not change the output.

[0]: https://www.hillelwayne.com/post/metamorphic-testing/

Testing with hardware is hard. You can emulate to some degree but that just helps making sure your tests are written correctly. In the end you have to run tests against the real thing. I deal with complex UIs that interact with hardware. If you are smart you can split things up so they are easier to test in isolation but the whole system has a ton of potential interactions that are hard to write test cases for.

The OpenCV example is a pretty easy one. You have clear inputs with clearly defined outputs. The only thing you have to do is to create sample data.

> Testing with hardware is hard.

Yup. I work in robotics.

I try to isolate the actual hardware interaction layer so that for testing you can mock the driver and hardware in one piece. Of course that does not test the driver. With any luck, the driver is pretty stable once it works, though. And the driver+hardware piece can have it's own (physical) test bench so that at least manual testing is, well maybe not efficient, but at least not painful.

Simulators are great but not always available. Or are too much work to get going.

One configuration often used for robots is the "boneless chicken". Take a bench, and bolt all the guts down to it in a configuration where they are easy to probe. Put the wheel motors someplace safe, with a synthetic load like a pony brake. Of course you can't test the nav stack that way. (I once interviewed a firmware engineer who was coming off of the Juicero shutdown -- say what you want about Juicero, but from the sounds of it their boneless chicken was outstanding, even integrated into the CI automation pipeline. Of course, they didn't have the nav problem).

Speaking of nav, I once saw a warehouse robot company's nav PR test micro-warehouse. Not the full test warehouse, just a 500 square foot or so area dedicated to testing nav PR's. It was integrated with CI automation. I could tell from the accumulated tire marks on the floor that they had nav pretty much nailed.

I have done several robot-to-elevator interfaces (probably more than anyone else). In the end, final testing always required something akin to a few midnight to 4 AM test blocks on the real elevator. And then of course as you point out:

> the whole system has a ton of potential interactions that are hard to write test cases for.

They often don't show up until the system is under load.

Nice write up. Just wanted to add that the problem of using simulators or mock is that now you have one extra code base to maintain totalling: the code, the test, and the mock. For mock drivers this can be a quite big task. In the end I just preferred to run it in real hardware as much as possible and go for unit tests. This from a person that generally does not like unit tests, but there is not a very cheap way of going with simulations sometimes.
Emulation works well if you have firmware and an emulator. For example, the Ardupilot autopilot software has both hardware-in-the-loop and software-in-the-loop packages which use the actual firmware. It runs off an STM32 emulator (I think) which is well defined. As you say, if you don't have that firmware, your emulator is only as good as your reverse engineering is.

When I'm testing thermal cameras there are a sequence of things I can check to ensure that the test worked: was the command sent without errors? Did I get an error back from the camera (e.g. CRC failure)? Does the state of the camera change as I expect it to? If all of those things are correct then the likelihood is that the command sent OK. Of course for states you should check various permutations (e.g. shutter open and shutter closed) to make sure that you don't have a bug in your state reading code :)

Here's a stereo matching example from OpenCV. This is a case when you do have the correct answer, but you don't expect to equal it, and your tolerance to accuracy varies with algorithm:

https://github.com/opencv/opencv/blob/055645080161c6af6083b6...

Tell me about it! I work on a search engine powered by relatively basic machine learning of user behaviour. We probably achieve the most relevant results in the world for our customers. That's not the hard part.

The hard part is tightening our development feedback cycle. Since we outperform all competitors, we don't have an oracle to test against. We can automate testing with a small sample of input-output pairs, but the brunt of the work is still done by humans trained and paid to judge the quality of the results. It's an awful position to be in.

I have started looking for better ways of doing it, and the most promising I've found so far is metamorphic testing, mentioned in another comment.

Property testing only takes you a short bit here, as far as I've been able to figure out.

(I have also glanced at the techniques used in bioinformatics, since those guys are good at comparing sequences, but that's more specific to our case and not a general solution.)

> How do you check your code works?

When I think about my projects "working", I always try to answer the following questions:

1) Is my code doing what I believe it should be doing? That question always have objective answers and is the subject of software engineering testing.

2) Is my solution solving my problem efficiently? Often that's a domain specific question and different domains have different ways of doing quality assurance, there's no silver bullet.