Hacker News new | ask | show | jobs
by angio 1251 days ago
The title mentions writing tests as if they are repl sessions because you're supposed to iterate until you have the correct result.
1 comments

How do you know if you have the right result though? You might know if you have a plausible result. Like if it output -1 then you know something is wrong I guess.

There's a much higher chance of detecting bugs that give plausible output if you aren't given the opportunity to say "eh looks plausible I won't bother double checking it".

Any programmer dumb enough to just blindly accept that their program is correct is also a dumb enough programmer not to have begun writing a test in the first place. If this gets the friction of writing a test at all so close to zero that these programmers start writing tests (albeit sometimes blindly accepting the output), then it's better than just trying their program on some inputs and calling it a day. It writes down the current output of the program. That's a big step up already. Now people evaluating the code can read some of its outputs without downloading anything.

I personally already use a similar cycle to expect-test when I write tests. A great place to start when writing test assertions is the debug output, just like this thing uses. Then you convert the output into assertions after you have thought through which parts are right or wrong. Just like you can do with expect-test, but without the automation. If you don't know whether the output is right or not, just add an assert(false, "hmm, not sure about this") aka todo!() and voilà, your test fails and future you can be prompted to check over it again.

Sometimes the output is obviously wrong, but you still don't know what the right output is. (At this point you know you're doing useful work!) The remedy is the same. Just make the test fail somehow.

> Any programmer dumb enough to just blindly accept that their program is correct is also a dumb enough programmer not to have begun writing a test in the first place.

Then what's the point of this methodology? It requires you to write tests and also blindly accept that your program is correct.

Maybe they should just rename it to "plausibility tests" or similar because that's what they're really testing. And while that does have some value, I think most of the value is negated by the fact that it sounds like they are properly vetted tests which they are not.

So a more appropriate name would help a lot. I still think it's a bad idea though.

> It requires you to write tests and also blindly accept that your program is correct.

No. You can say no. Just don’t accept it. You’re a human and it asks. Even if you do accept it you can modify it because you have eyes and a keyboard and it’s written right there where you wrote your test.

See https://github.com/rust-analyzer/expect-test for a demo gif of the rust version.

> No. You can say no. Just don’t accept it.

Yes you can except...

> You’re a human

Precisely. You're a human. Humans are lazy and bad at manually checking things are correct, especially if there's an "eh it's probably fine" option.

This is extremely well studied: https://en.wikipedia.org/wiki/Vigilance_(psychology)

As I said before, it's probably better than nothing in that it will help you detect obviously implausible results. But it really needs to be labelled as such otherwise people will assume that these are properly curated "golden" tests.

Of course. The reason expect-testing is good is that you need significantly less vigilance writing/maintaining the tests than when you do them with assertions for everything you care about, in exchange for slightly more vigilance required on the actual output of your programs. Yes you need to pay attention to the output, but your attention can now more focused instead of split between that job and the job of writing the test. It's possible to make mistakes when writing out your assertions, they are just generally more invisible and pernicious. Testing code is code like any other, and mistakes look like forgetting to test things, erroneous refactoring of the test or the code, mistakes copying tests around, mistakes writing out extrapolations, mistakes from sheer fatigue at the heft of the testing code you're trying to maintain. Further, the kind of vigilance required for expect-test is mostly not "Tesla kinda driving itself but driver is meant to watch the road". You are not checked out completely and talking to the other passengers or reading a book, but somehow legally responsible for taking control at any moment. You have your hands on the wheel and the car is offering turn-by-turn GPS directions.

Expect-testing is a good tradeoff in the short term (time to create tests) and in the long term (quality and size of test suites produced). The evidence for that is that there are pieces of software that need so many tests for their range of functionality, that you cannot test them any other way than in this style. I am talking about testing orders of magnitude more stuff than you could do manually. A great example is the Rust compiler UI test suite (https://github.com/rust-lang/rust/tree/master/tests/ui). It doesn't have to be that your tests have large amounts of noise, like compiler UI tests do. You can make focused and noise-free tests using this method, as the original post examined. The main thing is that writing the tests faster results in bigger test suites and more opportunity to look at the same code on different inputs. I would rather have two dozen tests that required me to look at their output, than three tests that made me think thoroughly about every single assertion. It's just a better use of your time. The rewards are compounded by the massively reduced cost of maintaining the test suite. The tests update themselves when the code does.

Overall, yes you have identified the negative part of the tradeoff. But you seem to have missed every single one of the benefits.

It's a repl, so you build the final output incrementally. Testing becomes part of the development workflow like you would do in languages that rely on the repl like lisps.

For example, you start with the inputs and you apply the first layer of transformations, then check what it does makes sense. Then maybe you refactor it out in its own function and add the generated test for it. Then you move on the next step and so on until you have the final result.