Hacker News new | ask | show | jobs
by svalorzen 1943 days ago
I don't really agree with the reasons given, even though my conclusions are the same. The main reason why research code becomes a tangled mess is due to the intrinsic nature of research. It is highly iterative work where assumptions keep being broken and reformed depending on what you are testing and working on at any given time. Moreover, you have no idea on advance where your experiments are going to take you, thus giving no opportunity to structure the code in advance so it is easy to change.

To make a concrete example, imagine writing an application where requirements changed unpredictably every day, and where the scope of those changes is unbounded.

The closest to "orderly" I think research code can become would be akin to Enterprise style coding, where literally everything is an interface and all implementation details can be changed in all possible ways. We already know how those codebases tend to end..

13 comments

As someone who has been on both the research and industry software end, there’s really not that much difference. Requirements change, you build that into your plans. Frankly, a lot of best practice software development that gets totally ignored by academia (e.g. OOP) can handle this exact case, and makes things way more flexible.

If the problem was only unpredictability, then projects with a clear and defined end goal (eg, a website to host results) would be of substantially higher quality. But they’re not. Well defined projects tend to end up basically just as crappy as exploratory projects.

The problem is evaluation and incentives. There’s literally no evaluation of software or software development capability in the industry. I know of a researcher that held a multimillion dollar informatics grant for 3 years. In that 3 years they literally did nothing except collect money. Usually there are grant updating mechanisms, and reports, but he bsed his way through that knowing there’s a 0.0000000% chance that any granting agency is going to look through his code. The fraud was only found because he got fired for unrelated activities.

I once looked up older web projects on a grant. 4/6 were completely offline less than 2 years after their grants completed. For 2 of those 4, it’s unclear whether the site ever completed in the first place.

>I know of a researcher that held a multimillion dollar informatics grant for 3 years. In that 3 years they literally did nothing except collect money.

I hate that every HN post about academia ends with an anecdote describing some rare edge-case they've heard about. Intentional academic fraud is a very small percentage of what happens in academia. Partly this is because it's so stupid: academia pays poorly compared to industry, requires years to establish a reputation, and the systems make it hard to extract funds in a way that would be beneficial to the fraudster (hell, I can barely get reimbursed for buying pizza for my students.) So you're going to do a huge amount of work qualifying to receive a grant, write a proposal, and your reward is a relatively mediocre salary for a little while before you shred your reputation. Also, where is your "collected money" going? If you hire a team, then you're paying them to do nothing and collude with you, and your own ability to extract personal wealth is limited.

A much more common situation is that a researcher burns out or just fails to deliver much. That's always a risk in the academic funding world, and it's why grant agencies rarely give out 5-10 year grants (even though sometimes they should) and why the bar for getting a grant is so high. The idea is to let researchers do actual work, rather than having teams manage them and argue about their productivity.

(Also long-term unfunded project maintenance is a big, big problem. It's basically a labor of love slash charitable contribution at that point.)

> I hate that every HN post about academia ends with an anecdote describing some rare edge-case they've heard about

This isn’t a rare edge case, this is very common in software projects. I’ve heard of it because I was part of the team brought in to fix the situation.

Intentional fraud only is rare when it’s recognized as fraud. P-hacking was incredibly widespread (and to some extent still is) because it wasn’t recognized as a form of fraud. Do you really think not delivering on a software project has any consequences? Who is going to go in and say what’s fraud, what’s incompetence, and what’s bad luck?

The problem is that the bar for getting software grants isn’t high, it’s nonsensical. As far as I can tell, ability to produce or manage software development isn’t factored in at all. As with everything else, it’s judged on papers, and the grant application. In some cases, having working software models and preexisting users end up being detrimental to the process, since it shows less of a “need” for the money. You get “stars” in their field, who end up with massive grants and no idea of how to implement their proposals. Conversely, plenty of scientists who slave away on their own time on personal projects that hundreds of other scientists depend on get no funding whatsoever.

Just curious, what kind of 3-year informatics grant not being completed ends up with a team brought in to fix the situation? Multi-million dollar grants don't sound big enough to be a dependency for any major customer (like defense or pharma), so I imagine if fraud was detected, they would just demand a reimbursement and ban the PI.

But I think you're both right in some sense. The cases of intentional major fraud is probably a rare edge case and they make the news when they're uncovered. But there's a lot of grey-ish area like p-hacking as you mentioned, plus funding agencies know there needs to be some flexibility in the proposed timeline due to realities. Realities like you don't necessary get the perfect student for the project right when the grant starts, as the graduate student cycle is annual, plus the research changes over time and it isn't ideal to have students work on an exact plan as if they are an employee.

But I totally agree that maintaining software that people are using should be funded and rewarded by the academic communities. A possible way to do this is have a supplement so that after a grant is over, people who have software generated from the grant that is used by at least 10 external parties without COI, should be funded 100K/yr for however many years they are willing to maintain and improve it. Definitions of what this means needs to be carefully constructed, of course.

I'll be a bit vague to protect my coworker's privacy, but the scientist was fired for other, unrelated violations, and my boss was brought in to replace him. I think he was leading an arm of a "U" grant, so he wasn't the only senior PI on it. Since they handled it internally, they couldn't just demand a reimbursement. On some level administration knew that the project wasn't moving forward, but once we started asking around, it was clear that there was no effort to start the project at all.

>But I totally agree that maintaining software that people are using should be funded and rewarded by the academic communities. A possible way to do this is have a supplement so that after a grant is over, people who have software generated from the grant that is used by at least 10 external parties without COI, should be funded 100K/yr for however many years they are willing to maintain and improve it. Definitions of what this means needs to be carefully constructed, of course.

I think that this is a great idea.

I can tell you why the sites went offline, because the funding stopped. I don't know what you're research background is but its painful to even get 5 GBP a month to host a droplet on digital ocean in a pretty lucrative department with liberal internal funding.
Agreed, but all these little things are just a sign that the industry just does not give a shit about software. They could develop mechanisms to fund this stuff, pretty easily actually. But they don’t.

A couple of other weird inequities that I’ve found are: 1. It’s hard to get permission to spend money on software subscription based licenses since you won’t “have anything” at the end. However, it’s much easier to get funding for hardware with time based locks (e.g after 3 years the system will lock up and you have to pay them to unlock). The end result is the same, you can’t use the hardware after the time period is up, but for some reason the admin feels much more comfortable about it.

2. It’s hard get funding to hire someone to set up a service to transfer large amounts of data from different places. It’s much easier to hire someone to drive out to a bunch of places with a stack of hard drives and manually load the data on them, and drive back. Even if it’s 2x more expensive and would take longer. Why? Again my speculation is that the higher ups are just more comfortable with the latter strategy. They can picture the work being done in their head, so they know what they’re paying for.

Louisiana state government spent a buttload of money on dedicated high speed fiber optic lines between a bunch of different universities in the state for videoconferencing, telenetworking, "grid computing" etc. 10 years later the only people who remember how to use the system are at LSU, rendering the purpose moot. Everyone else just uses Zoom or Skype.

https://www.regents.la.gov/assets/docs/Finance_and_Facilitie...

> The end result is the same, you can’t use the hardware after the time period is up, but for some reason the admin feels much more comfortable about it.

Simple: predictability. With a subscription based model, admin has to deal with recurring (monthly / yearly) payments, and the possibility is always there that whatever SaaS you choose it gets bought up and discontinued. Something you own and host yourself, even if it gets useless after three years, does not incur any administrative overhead and there is no risk of the provider vanishing. Also, there are no "surprise auto renewals" or random price hikes.

> 2. It’s hard get funding to hire someone to set up a service to transfer large amounts of data from different places.

Never underestimate the bandwidth of a 40 ton truck filled with SD cards. Joke aside: especially off-campus buildings have ... less than optimal Internet / fibre connections and those that do exist are often enough at enough load to make it unwise to shuffle large amounts of data through them without disrupting ongoing operations.

Is N years of opex not part of the budget in grant applications?
In research no and it would depend entirely on your institution. For example, I looked at a job putting together a portal for people to freely examine the research put together for a research team. The project had secured a connection with the british museum, and so that website would live on under that. However, if the project had asked to host it themselves even for 60$ a year for 10 years the answer would be no. Funding grants see small opex that extend beyond the life of the project to be open to corruption or just too facile to fund, wrongly or rightly.
> As someone who has been on both the research and industry software end, there’s really not that much difference. Requirements change, you build that into your plans. Frankly, a lot of best practice software development that gets totally ignored by academia (e.g. OOP) can handle this exact case, and makes things way more flexible.

I've done both, and OOP can also make things worse. Now instead of just doing the calculations in a straightforward procedural fashion anyone who knows the research can understand, you've added a layer of structure to obfuscate it, and that structure may be harder to change if you guessed wrongly about what will be consistent and what won't. Research by its nature needs to be more flexible and will be more unpredictable than industry development. It is far more common to have to go back and reexamine even your most basic assumptions.

Of course a lot of researchers are doing the same things as industry (what should be described as development and not be getting research funding), and are certainly doing a much more amateur job of it.

Grant fraud is penalized severely in the US by the way. You can even get a bounty for reporting someone.

I work on a 12+ year academic (full stack Python) codebase, where there was an initial push for an OOP/DI architecture which was key to adapting to later grant requirements. The codebase is still evolving fine.
> I know of a researcher that held a multimillion dollar informatics grant for 3 years. In that 3 years they literally did nothing except collect money.

I wonder if a whistleblower payout similar to the one that SEC is doing for 1M+ fines (10-30%) would help in cases like this. The host organization would potentially be on the hook as well, so there is going to be a significant incentive to not let that happen (especially with all the associated reputational damage).

> The main reason why research code becomes a tangled mess is due to the intrinsic nature of research. It is highly iterative work where assumptions keep being broken and reformed depending on what you are testing and working on at any given time. Moreover, you have no idea on advance where your experiments are going to take you, thus giving no opportunity to structure the code in advance so it is easy to change.

I'd say you're confirming the author's theory that writing code is a low-status activity. Papers and citations are high-status, so papers are well refined after the research is "done". Code, however, is not. If the code was considered on the same level as the paper, I think people would refine their code more after they finish the iteration process.

Yes... and no. It is true that after a result is obtained, one could clean up the code for publication. And it is true that coding is not seen add first class at the moment.

At the same time, you need to consider that such a clean up is only realistically helpful for other people to check whether there are bugs in the original results, and not much else. Reproducing results can be done with ugly code, and future research efforts will not benefit from the clean up for the same reasons I outlined in my previous post.

While easing code review for other people is definitely helpful (it can still be done if one really wants to, and clean code does not guarantee that people will look at it anyway), overall the gains are smaller than what "standard" software engineers might assume. And I'm saying this as a researcher that always cleans up and publishes his own code (just because I want to mostly).

> At the same time, you need to consider that such a clean up is only realistically helpful for other people to check whether there are bugs in the original results, and not much else.

I assumed that most code published could be directly useful as an application or a library. Considering what you're saying, this might be only a minority of the code. In that case, I agree with your conclusion about smaller gains.

Most academic code runs once, on one collection of data, on a particular file system.

Academic code can be really bad. But most of the time it doesn't matter, unless they're building libraries, packages, or applications intended for others. That's when it hurts and shows.

I'm a research programmer. I have a master's in CS. I take programming seriously. I think academic programmers could benefit from better practice. But I think software developers make the mistake of thinking that just because academics use code the objective is the same or that best practices should be the same too. Yes, research code should perform tests, though that should mostly look like running code on dummy data and making sure the results look like you expect.

I know a lot of "research programmers" (meaning people who write code in research labs but are not themselves the researchers or investigators on a study), and they often have MS degrees in CS - though actually, highly quantitative masters degrees where very elaborate code is used to generate answers is a bit more common than CS per se (math, operations research, branches of engineering, bioinformatics, etc).

Here's the thing - in industry, this background (quant undergrad + MS, high programming ability, industry experience) is kind of the gold standard for data science jobs. In academic job ladders it's... hmm. Here's the thing - by the latest data, MS grads in these fields from top programs are starting at between 120k-160k in industry, and there are very good opportunities for growth.

I actually think that universities and research centers can compete with highly in demand workers in spite of lower salaries, but highly talented people in demand will not turn away an industry job with salary and advancement potential to remain in a dead end job.

Yeah my standard quote about research code is that it is not the product, so it is ok thta it is bad. The results are the product and those need to be good. Someday someone will take those results (in the form of some data or a paper) and make a software product, and that should be good.
I am under the impression that most authors do not even publish functioning code when publishing ML/DL papers which I find to be absurd. The paper is describing software. Imo the code is more important than the written word.
Shouldn't checking for bugs be of primary importance. How many times have impressive research results turned out to be a mirage built upon a pile of buggy code? I get the sense that is far too common already.
> How many times have impressive research results turned out to be a mirage built upon a pile of buggy code?

You're actually making bugs sound like a feature here. I'm pretty sure that if you've gotten impressive results with ugly code, the last thing you want to do is touch the code. If you find a bug, you have no paper.

I think software quality in research has nothing to do with the problems themselves. It's more like article suggests that nobody cares about your software. The only goal is to get published and be cited as many times as possible. Your coding mistakes don't matter if they cannot be found out or hurt your reputability.

How many tests would be written for business software if it had only to run for one meeting and then never be looked at again?

There seems to be an underlying assumption in many of these posts that code has no value once papers are published. This hasn't been my experience working in a research environment at all. The big, complex pieces of code are almost always re-used in some way. For example, theory collaborators send us their code so we can generate predictions from their work without bothering them. Probably 50% or more (and usually the most important parts) of the code written to process experimental data ends up in other experiments. From the perspective of an individual experimentalist, there is tremendous value in creating quality code that can be easily repurposed for future tasks. This core code tends to follow the individual in their career. In some ways it's an extension of commonly used mental tools, and there are diverse incentives to maintain it.
> To make a concrete example, imagine writing an application where requirements changed unpredictably every day, and where the scope of those changes is unbounded.

I don't have to imagine it, I'm employed in the software industry.

Seriously, nothing you describe sounds any different from normal software development.

The only difference is speed IMO. Sure, new requirements appear and they can wildly change the underlying assumptions of the systems - but usually, in such case we're given months or years to adapt/rewrite the system in a systematic manner. If, for every wild idea the researcher wants to explore, this amount of rigor was applied in its implementation, I'm guessing the research would slow down immensely. BTW most of research code written for chasing dead ends (quickly testing some small hypetheses), and will be discarded without sharing with anyone - so, investing into writing it properly seems especially wasteful.
Rapid fundamental changes and short-lived code to explore an idea that will most likely be thrown away are very much the everyday development experience in industry too, IME.
The program I wrote for my dissertation is as good as it needs to be for a program that had to run once!
In my world, it does sound different, I work with HIPAA data that takes months to get access to. So sharing your code is borderline unacceptable to some orgs, even if it itself doesn't have any privacy data, there's a mass paranoia that you'll accidentally leak patient data, which can lead to fines of 2 million USD.
> The main reason why research code becomes a tangled mess is due to the intrinsic nature of research. It is highly iterative work where assumptions keep being broken and reformed depending on what you are testing and working on at any given time.

Oh, boy, how many times have I heard this working at a startup. There is some truth to it, it's hard to organise code in the first weeks of a new project. But if you work on something for 3+ months, it becomes a matter of making a conscious effort to clean things up.

> To make a concrete example, imagine writing an application where requirements changed unpredictably every day,

Welcome to working with product managers at any early stage-company. Somehow I managed to apply TDD and good practices most of the time. Moreover, I went back to school after 7+ years developing software full-time. I guarantee that most of the low-quality research code is a result of a lack of discipline and experience in writing maintainable software.

> I guarantee that most of the low-quality research code is a result of a lack of discipline and experience in writing maintainable software.

Bingo! Most research code is written by graduate students who never had a job before, so they do not know how to write maintainable software. You are definitely the exception, as you held a software dev job before going back to school.

Some researchers from top-10 schools still publish python2 code in 2020. I don't have an explanation for that. It's not even a lack of experience, but something on another level.
Mathematics doesn't suddenly stop working because your interpreter is a bit old.
>To make a concrete example, imagine writing an application where requirements changed unpredictably every day, and where the scope of those changes is unbounded.

That sounds like software development, alright. It takes a while for domain experts to learn that if programmer ask "is X always true/false", they mean that there are no exceptions from that rule.

I would like for researchers to just name variables sensibly. Even that would improve code quality a lot.

Still the key problem is that there are zero incentives for researchers to even make their code readable! It does not improve any of the metrics they are judged by.

Yes, not pointing out the difference between coding some novel technique and a well defined software project, completely misses the reason the code is often not well organized. Suggesting that researchers are bad programmers is just a lazy excuse, somewhat damaging, and by no means the rule. I wrote a large complex framework for my research and the very nature of it causes me to add modules and techniques for parts I didn't know would work. And at times hard forks for when I wanted to try something new, which merging back would be impossible to do cleanly. At times you have a hunch and like a fever dream, change who knows what, but you just have to see something through. There is no waterfall method, kanban and agile makes no sense here and even unit tests are I'll defined.
This sounds like my software development methodology when I was in my early teens. I was certainly able to get things done and explore all kinds of things (I was doing game dev of course), but the code was a mess and I didn't even have a mature understanding that it was. I just thought that was how programming was and you just had to be really smart to keep things straight in your head.
Do you think that people doing research at large technical organizations structure their code in the same way as academics? No, although there's always a portion which is active and unstable, they create packages, define interfaces, abstract out pieces which can be reused reliably and depended on. Similarly for other types of researchers in fields where the code is considered an important product. Eg. if you are doing research in compiler design, you're likely to want to create a compiler which can be used by other people. So you make a stable thing with tests, automated builds and so on. And you delimit and instrument the experimental parts.

The real reason is the incentives. Not just are there no incentives to produce good quality code, there are incentives which make people focus on other outputs. Publish or perish means that people put up with technical debt just to get to the next result for the next paper, then do it again and again.

>The real reason is the incentives. Not just are there no incentives to produce good quality code, there are incentives which make people focus on other outputs. Publish or perish means that people put up with technical debt just to get to the next result for the next paper, then do it again and again.

I believe this is true and is fueled by a misconception of what software is in research. Software in research is often akin to experimentalist work in the past. It's tacked onto theoretical work projects as an afterthought and not treated as what it really is: forcing the theory to be tested in a computational environment.

If we start treating research software like experimentalism in the past, we might get a bit more rigor out of the development process as well as the respect it really deserves.

I've worked with a lot of research code. I agree with you that tangled code is somewhat intrinsic to the kind of code written for research.

Here's the thing. Sometimes, there's no code - I mean, they'll find something, but nobody can say, with certainty, that it is the code that generated the data or results you're trying to recreate. There's often no data - and by that, I mean, nothing, not even a dummy file so you can tell if it even runs or understand what structure the data needs to be in. No build, no archive history, no tests. And when I say no tests, I'm not talking about red/green bar integration and unit tests, I mean, ok, the code ran... was this what it was supposed to produce?

Many of these projects are far, far more messed up than the intrinsic nature of research would explain - though I will again agree that research code may be unusually likely to descend into entropy.

There certainly is quite a lot to be said about constant requirements drift. However, this is not something untypical to some of fast-paced product work or, even more closely, r&d effort within the industry.

What then drives the improvement of the code quality is the potential need for continuity and knowledge retention - either in the form of iterative cleaning of the debt or the re-write. This is reliant on the perceived value for the organisation. From this perspective it's more straightforward to get to author's reasons.

> imagine writing an application where requirements changed unpredictably every day

Imagine?

There's only one way to solve this: Simplicity.

Ironically this is also what occams razor would demand from good Science, so you'd have a win win scenario, where you both create good software and good research, because you focus on the simplest most minimal approach that could possibly work.

How do you keep a codebase simple when you need have things in it like implementations of state of the art algorithms to compare against and the previous iterations of your own method so that you can test whether you're actually improving? Then, depending on what you're doing, there's also all the extra nontrivial code for tests and sanity checks of all these implementations.

Simplicity is a nice dream. The realities of research are very often stacked against it.

How the heck to you hope to gain any insighfull metrics when you've got a cobbled together mess that you only half understand. For what it's worth you might only be benchmarking random code layout fluctuations.

I've seen research groups drown in their legacy code base.

The issue of juggling too many balls you describe is one you only have to begin with because the state of the art implementations are so shoddy to begin with.

Research suffers as much as everybody else from feature creep. Good experiments keep the number of new variables low.

Research code is not only written to measure runtime. Reducing the argument to only that aspect is not helping the discussion.

And you say it yourself: good experiments change a single variable at a time. So how do you check that a series of potential improvements that you are making is sound?

I can't quite follow what the article is trying to describe because of the heavy use of analogies.

A Google search makes it look like Julia has a mechanism where you can extent the sets of overloads of a function or method outside the original module. The terminology is different (functions have methods instead of overloads in their speak). I don't see how that feature solves the problem in practice.

In my experience simplicity and generality don‘t go well with performance. If you want to build something that can be used for all kinds of problems and it is simple it will be slow as hell compared to the (dirty) optimised code running hardcoded structures on the GPU
Simplicity pretty much excludes generality in a lot of cases, you're only able to port code to the GPU if it wasn't a million LOC to begin with, so you're pretty much making the case for it.

Note that Simple != Easy or Naive

Hardcoded structures is potentially exactly the kind of simplicity needed.

What's not simple is a general "this solves everything and beyond" code-base with every imaginable feature and legacy capability.

>It is highly iterative work where assumptions keep being broken and reformed depending on what you are testing and working on at any given time.

This is describing infinitely fast and efficient p-hacking (i.e. research that is likely to produce invalid results).

If your assumptions are broken then that should ideally be reported as part of your research.

When you do research, you ideally start out with fixed assumptions, and then test those assumptions. The code required to do this can be buggy (and can therefore get fixed), and you can re-purpose earlier code, but the assumptions/brief shouldn't change in the middle of the coding it up.

If you aren't following the original brief, you've rejected your original research concept and you're now doing a different piece of research than you started out - and this is no longer a sound piece of research.

Research should be highly dissimilar to a web design project in this respect.

The reason these projects often become a tangled mess is because researchers don't have the coding skill to program any other way (in my opinion, and nor do institutions invest sufficiently in people who do have this skill).