Hacker News new | ask | show | jobs
How (Much) Do You Test? (codetrails.com)
16 points by inventitech 4197 days ago
5 comments

I don't honestly understand what is the point of this. How does time spent writing something have to do with the quality of it?

Tests are supposed to be simple and straigthforward, I would hope to spend less than 5% of my time actually writing tests. If I spend more then it means that my tests are too complex to understand, read and write and that my whole architecture is hard to test.

Are we trying to evaluate quality of code (yes, tests are code too) by the amount of time spent writing it? Should we go back to assembly because it takes us longer to write it properly hence it's more correct?

I honestly don't understand the point, enlighten me, please.

The first step is just to observe how much time is spent on testing. We have basically no clue about this. And our first study with students has also shown that they had no clue about it.

In a second step, we can think about which implication testing time might have on quality. As you said correctly, more time does not necessarily correlate with better quality or higher productiveness. Maybe there is a certain range of testing efforts that can be associated with good quality tests? E.g. if you spend less than x on testing, your tests are likely to be bad. If you spend more than y on testing, it might be worth investigating whether you have unusually high testing targets. Or your tests might be extremely hard to maintain.

I think the answer is Janus-faced and there is no single, simple answer (see also "Testivus on Test Coverage", http://www.artima.com/weblogs/viewpost.jsp?thread=204677).

The longer I've written tests the truer this becomes. Nowadays it doesn't take me much time to write the tests. I'm not a TDD zealot but I often write tests in conjunction with a new feature. I've found that it makes the overall process faster, resulting code simpler and interface to the code more consistent.

A small investment of time writing tests before implementation yields the best overall results across the board.

This looks like a great example of "Be careful what you measure, because that is exactly what you will improve."
Yes, this is certainly true to a (I think) minimal extend. Why? Because improving the time you spend on testing is something that is essentially hard to do -- and only a meta-information, anyway.

An example: A quality assessment of your code tells you that your methods are too long. This gives you a concrete task to do: You simply split all too-long methods. However, with testing time, this is not so: Increasing the effort spent on testing is something you cannot "just do" without a sensible plan. If you want to increase this metric, you have to start to think what is wrong with your current test strategy (maybe nothing's wrong at all), and come up with an action plan of how to do.

If you're that far in the game already, I think this metric has done what it can do. Plus, you'll likely see an increase in the metric. Which is justified.

Coming up with (critical) test efforts in the wild is our task now.

I find myself bouncing between at least three modes.

There are some things that I really know how to code correctly out of my head and in cases like that sometimes I put very little effort into testing.

If I am writing something greenfield that I don't completely understand (say some algorithm inspired by binary search or any other place where off-by-ones could eat you alive) then I would say there is little difference between coding and testing; mostly I use tests and the Java debugger the same way that Ruby or Python people use the REPL, except my tests get checked in at the end.

Another case is dealing with legacy code where I often end up writing tests to document existing behaviors and prove that the system behaves correctly after refactoring. I've seen many code bases that were awful (written by the kind of people who struggle to have unique primary keys) but were salvageable because they had a test suite.

Then on top of that there are a number of special cases too; for instance, if you are doing something with threads that is complicated at all you probably want to write a load balancer or if you're doing classic Map/Reduce you'd better test your mappers and reducers thoroughly before you ever touch a cluster.

Wow, the results are quite eye opening. I'm admittedly not the most "diligent" when it comes to testing (I write enough to pass code review), but I would definitely fall in the camp of overestimating the amount of time spent testing by 3x

But I'm not quite sure if there is a direct correlation between time spent writing tests and overall software quality?

> But I'm not quite sure if there is a direct correlation between time spent writing tests and overall software quality?

Good point. We do not know, either. That is part of the reason why we do this research. :)

As has been said before, the easier to understand and maintain your tests are, the less amount of time you actually need to spend on them.

However, you could argue, that there must be some minimal amount of time that you absolutely need to spend on QA work like testing to ensure a certain quality.

We are investigating this.

I generally don't write unit tests, and very rarely write system tests. Partly this is because of the type of stuff I deal with (Django applications), partly it's because I generally have access to good QA people, but mostly it's because the kinds of errors I see would not be caught by the kind of testing I can write in any reasonable amount of time. Here are some real world recent errors I've had to deal with, where testing wouldn't have saved me:

- Poorly documented external API returns a redirect instead of a 404 in some cases.

- Missing <title> tag in a random HTML template.

- Poorly assembled SSL certificate chain for an HTTPS service fails for some browsers, but not all.

- Celery tasks taking longer to process because there are now more things to do as more users sign up.

- Starting a DB transaction in the wrong place causes more failures than necessary when only some of the modified rows cause constraint violations.

- External API returns 0 sized file with no errors instead of the correct response, depending on time of day and phase of the moon.

- CSS issue rooted in poorly set up position and z-index properties causes elements to be mis-aligned.

- Missing clear: both;

- Database server's disk filled up with log files from a rogue service.

- Hosting provider gets DDoS'ed and their mitigation software starts returning random site redirects.

- Namecheap's DNS gets DDoS'ed and the site is inaccessible.

These are not the types of things that are easy to test, but very easy to verify by hand or catch at runtime. Personally, I am much more in favor of design-by-contract and logging and alerting of bad conditions at runtime. That's not to say I wouldn't test a financial trading algorithm, or a binary search library function. One of my projects (LogHog, a much easier to use syslog-type-thing for Python), has lots of tests because it's got a very simple and well-defined interface. It's almost library code and verifying its correctness is both easy and useful at the same time. But not all projects are like that, and sometimes easy testing is not actually useful and useful testing is cost-prohibitive.

tl;dr: I can test the obvious stuff, but I can also just spend more time writing it more carefully. It's the non-obvious stuff that breaks: disks fill up, external API's misbehave, bad CSS, etc. You will never have "100% code coverage" because you are not testing all the stuff that actually affects your system, and your users don't care why the application is not working, they just care that it's not.

I generally agree with you.

"sometimes easy testing is not actually useful and useful testing is cost-prohibitive."

While I agree here, too, I think that the situation where testing is not useful happens very rarely in practice. Everything can break, and when you think it cannot be possibly be wrong now, a future regression might occur.

I think badly written tests are a different problem. I.e. tests that for example assert on the serialized string and hence take a lot of maintainability effort to adapt to changing production code. But then there is nothing wrong with the test per se, its just badly implemented.

I can think of lots of cases where testing is not useful in practice. Simulating random filesystem corruption when writing a Node.js app is pretty much useless: you don't know which files will be corrupted, and testing would take a huge (billions of lifetimes) amount of time to simulate all possible combinations even for a small program. Yet, this can (and in my experience did) take down a production system.

My other examples I already listed, I think, are examples of this too: you can test all you want against a spec provided by your external API maintainer, but if they don't follow the spec, you have a problem. We don't have a good framework (and I don't think we can create one), for testing layout problems in HTML that are caused by bad JS or CSS.

Basically, you do hit diminishing returns quickly, and you do get a false sense of security by having lots of small unit tests that don't actually prevent realistic failures.

As it's an Eclipse plugin I would guess it's aimed at JEE / enterprisey developers.

I code similar apps to that which you describe and tend to agree that up-front testing is hard to justify a lot of the time. However when some obscure but does arise I do try to write system tests (selenium, webrunnners etc) or beefing up the operational monitoring so I can avoid them recurring.