Hacker News new | ask | show | jobs
by YorkshireSeason 2855 days ago

   It took about a week before 
   I was happy with it. 
And what made you thing that "I was happy with it" is enough from the point of view of the pursuit of scientific truth (e.g. how did you deal with the problem of confirmation bias)? More than half a century of experience with software engineering has told us the hard way that it's near impossible to write bug-free code. And where this happens (e.g. aviation software) it takes extreme dedication and resources.

There is a reason that Alan Turing invented program verification in the 1940s [1].

Let me finish with a quote from computing pioneer M. Wilkes [2]: "I well remember when this realization first came on me with full force. The EDSAC was on the top floor of the building and the tape-punching and editing equipment one floor below. [...] It was on one of my journeys between the EDSAC room and the punching equipment that "hesitating at the angles of stairs" the realization came over me with full force that a good part of the remainder of my life was going to be spent in finding errors in my own programs."

[1] A. M. Turing, Checking a Large Routine. http://www.turingarchive.org/browse.php/b/8

[2] M. Wilkes, Memoirs of a Computer Pioneer, MIT Press, 1985, p. 145.

1 comments

The hard part wasn't getting the bugs out (we didn't find anything substantial), it was making it run on a machine different from my laptop, making it take non-hardcoded data and parameters, cleaning the code up and documenting sufficiently that we could reasonably understand it after not looking at it for half a year, and packaging it in such a way that users wouldn't have to spend half a day installing it.

I don't think you can reasonably expect software that extracts features from photographs to be proven correct. What would the specification even look like? How many man years would you spend on verifying OpenCV and the two dozen other dependencies you rely on? It's not like all math papers provide machine checked proofs and that would probably be an easier endeavor.

You label a few photos yourself and then check if the labels still correspond in the newer versions of the program.

Then when you edit any parts of the code and the test suite breaks - you can see that some of the invariants had been broken.

This is even more important for python, as at least having a simple test suite can already tell whether your environment is sane. And that's about 90% of the newcomer's time saved.

That is completely different from formal verification. Simple testing is something all researchers that I know do, quite extensively actually.
I agree that formal verification in this problem space is currently not feasible (even if we ignore the oracle problem: where to get the spec from). But let's not make the perfect the enemy of the good. Even a basic reproducible test suite (e.g. running the software on some very basic pictures with a known result, and running against similar software) in an automated fashion, would help. Almost no scientific software I've seen get even those basics right.