Hacker News new | ask | show | jobs
by fwdpropaganda 2856 days ago
Having been in the academia I can tell you that from my experience there's two main reasons why scientists are often reluctant to share their code:

A) embarassement at having other people see how bad it is. Not necessarily wrong, but bad in other ways.

B) wanting to keep an advantage by continuing to publish on top of the work already done, whereas anyone else wanting to get in on the same idea would have to first re-build the original work.

8 comments

Scientists have to have the truth of their malpractice rubbed hard against their face before they admit that they have a problem. Since Aristotle science has be wrought with what essentially amounts to bullshit and it only ever changes once so many people call bullshit so frequently that a new standard develops.

Not pre-publishing a research thesis / methodology is bullshit.

Not publishing data is bullshit.

Not publishing exact code (and runtime, etc) is bullshit.

Not publishing when research is complete is bullshit[0].

Not publishing is bullshit. I don't care who your funder is, it should be illegal for funders to selectively publish research. It's obviously one sided propaganda. The only time not publishing makes sense is when there are national security or global security ramifications to the research or if funding needs to be lined up for patents.

[0] I've seen a PhD candidate delayed 4 years from publishing because her supervisor was trying to coordinate a team to publish with research along the same vein all at once to make it a groundbreaking set of discoveries. It's basically p-hacking by some other name.

Not academia, but in finance, I remember an email interaction that was very nearly company-wide at a large hedge fund. In both our production & development environments, all generated core files on our Linux servers were configured to be dumped to a shared NFS mount (ostensibly to ease debugging). Each environment had around a 1 or 2 TB mount. Teams were expected to clean up their cores when they were no longer needed (useful for debugging those heisenbugs). Nearly all of the code was C++. Think 10s to 100s of millions of lines of C++ that had to work together spread across hundreds of services, countless libs that were used to interconnect.

Anyways, we were working on a fairly large company-wide effort. We were migrating from 32-bit to 64-bit, had 2 OS upgrades (new version of RHEL, migrating from Windows XP to Windows 7), 2 compiler upgrades (I forget the gcc versions - it came with the upgrade, and on Windows VS2003 to VS2008). Because of the coupling of the libs & services, both the Linux & Windows sides had to be released at the same time. We could update the Windows boxes at anytime and run the old software on it, but we basically couldn't develop the old software on Win7 as VS2003 wasn't compatible (it could be made to work, but the IDE or compiler or linker may fail randomly and hard). This is just to explain the scope of the effort we were making.

Back to why I mentioned the core files. To anyone that's done it, it's obvious the magnitude of effort doing this all at once is. There will be bugs. Specifically a metric-shit-ton of them. Those core files were needed. They stopped being generated because our shared core space filled up. The culprit? A PhD quant running some model in Python using some C/C++ extensions kept crashing Python, each one dropping a multi GB core file. This one quant was single-handedly using over half of the shared space. When confronted/inadvertently shamed in a development-wide email (we sent out usage breakdowns per user/project when a usage threshold was exceeded), his response was golden: "What are these core files, and how do I stop them from being generated?" Uniform response from everyone else: "Fix your damn code!" Mind, at the time, this was in front of ~600 developers.

The kicker: the reason this whole effort was being made? Because someone sent the CEO an XLSX spreadsheet in 2008 (the exact year may have been later, but immaterial to the story), and still being on WinXP & Office 2003, he couldn't open it. So...down the rabbit hole we went.

It is not rational to hold code to a lower standard than experimental technique.

As for B, that harkens back to the pre-scientific attitudes of medieval alchemy.

B is utterly unacceptable if you're funded by public money. As a matter of fact, I don't understand why public research grants don't come with a binding obligation to make all results available, including source code.
While still in academia I wrote some code to extract some data from images. It took about a week before I was happy with it. We then decided to make it available for other people. Polishing it until somebody else could use it with reasonable effort took several months and a team of students. That is time that could have been spent writing more papers. It is not at all clear whether enough people actually use the software to justify the effort.

Researchers aren't paid (and usually aren't trained) to write software that can be installed on more than one computer. I don't like that situation either, but that how it currently is. Research code also suffers from accelerated bit rot. The dependencies are often research code themselves and projects are abandoned when the results are published.

   It took about a week before 
   I was happy with it. 
And what made you thing that "I was happy with it" is enough from the point of view of the pursuit of scientific truth (e.g. how did you deal with the problem of confirmation bias)? More than half a century of experience with software engineering has told us the hard way that it's near impossible to write bug-free code. And where this happens (e.g. aviation software) it takes extreme dedication and resources.

There is a reason that Alan Turing invented program verification in the 1940s [1].

Let me finish with a quote from computing pioneer M. Wilkes [2]: "I well remember when this realization first came on me with full force. The EDSAC was on the top floor of the building and the tape-punching and editing equipment one floor below. [...] It was on one of my journeys between the EDSAC room and the punching equipment that "hesitating at the angles of stairs" the realization came over me with full force that a good part of the remainder of my life was going to be spent in finding errors in my own programs."

[1] A. M. Turing, Checking a Large Routine. http://www.turingarchive.org/browse.php/b/8

[2] M. Wilkes, Memoirs of a Computer Pioneer, MIT Press, 1985, p. 145.

The hard part wasn't getting the bugs out (we didn't find anything substantial), it was making it run on a machine different from my laptop, making it take non-hardcoded data and parameters, cleaning the code up and documenting sufficiently that we could reasonably understand it after not looking at it for half a year, and packaging it in such a way that users wouldn't have to spend half a day installing it.

I don't think you can reasonably expect software that extracts features from photographs to be proven correct. What would the specification even look like? How many man years would you spend on verifying OpenCV and the two dozen other dependencies you rely on? It's not like all math papers provide machine checked proofs and that would probably be an easier endeavor.

You label a few photos yourself and then check if the labels still correspond in the newer versions of the program.

Then when you edit any parts of the code and the test suite breaks - you can see that some of the invariants had been broken.

This is even more important for python, as at least having a simple test suite can already tell whether your environment is sane. And that's about 90% of the newcomer's time saved.

That is completely different from formal verification. Simple testing is something all researchers that I know do, quite extensively actually.
I agree that formal verification in this problem space is currently not feasible (even if we ignore the oracle problem: where to get the spec from). But let's not make the perfect the enemy of the good. Even a basic reproducible test suite (e.g. running the software on some very basic pictures with a known result, and running against similar software) in an automated fashion, would help. Almost no scientific software I've seen get even those basics right.
I would rather the publication give a clear enough description that someone with time and resources could reproduce the software based on that description and achieve the same results. I think that would actually foster more innovation and collaboration as people implementing the software from the publication will necessarily become much more familiar with the work than they would have otherwise.
When I was in grad school, B was routine and open - not something you'd whisper but something you'd say openly.
In my case, it is very much column A.
If you're embarrassed by your research code, how do you know it works, and how do you know what you're going to publish isn't built on quicksand?
Embarrassed by its cobbled-togetherness. Not in whether or not it works.
Not really answering the question, which was "how do you know it works?"

I would contend that if you have proven to yourself that your code works and are therefore capable of proving it to other folks (via e.g. solid testing), you should not be ashamed of the spit and glue.

It is research code, we all know what that looks like. But if, on the other hands you haven't proven to yourself it works, then it's definitely something to be ashamed of - scientifically speaking.

It’s much the same reason why you wouldn’t go to work wearing a days old, stained shirt and ripped pants — yeah, you’re more than likely going to work just as hard and as well than if you were wearing clean clothes, but that’s not going to stop people from judging you based on your appearance.

Most academics write code to just work. Not work well, or to be generalized, or to be efficient, just work. And while that’s absolutely fine, as your results being reproducible from the code is all that really matters, a lot of people don’t see it that way and will only see code slapped together haphazardly and dismiss you because of it.

This article shows very clearly why that does not work. The Princeton team could not reproduce the Berkeley results, yet the inaccessibility of the code meant that the latter persisted as a road-block for almost a decade.

Imagine if this reproducibility excuse were applied to experimental results and technique: we don't have be careful or explain in detail what we are doing, as reproducibility will take care of any errors. One consequence would be that, as the current state of knowledge became less certain, it would become less clear what to do next.

Wearing dirty clothes to work doesn't end up wasting 7 years of other people's time.

Publishing scientific papers that no one can re-implement does, and hugely so.

Therefore, not a very adequate comparison.

"Maths" is how you know it works.

Especially in fields like physics there are many limits that you can derive analytically. Some times those are highly non-trivial.

I understand the embarrassment, I am sometimes embarrassed as well looking at quickly-put-together code

I do appreciate papers that give you either a reference implementation or at least enough details on how it was done. I spent too many hours on papers that didn't trying to reproduce their claimed results...

Usually testing. Just because code works flawlessly doesn't mean it isn't embarrassing; and just because you are proud of the code doesn't mean it's bug free.
It's the most common one, that's why I put it first :D