Hacker News new | ask | show | jobs
by dmlorenzetti 2130 days ago
Hard-coded file paths for input data. File paths hard-coded to use somebody's Google Drive so that it only runs if you know their password. Passwords hard-coded to get around the above problem.

In-code selection statements like `if( True ) {...}`, where you have no idea what is being selected or why.

Code that only runs in the particular workspace image that contains some function that was hacked out to make things work during a debugging session 5 years ago.

Distributed projects where one person wrote the preprocessor, another wrote the simulation software, and a third wrote the analysis scripts, and they all share undocumented assumptions worked out between the three researchers over the course of two years.

Depending on implementation-defined behavior (like zeroing out of data structures).

Function and variable names, like `doit()` and `hold`, which make it hard to understand the intention.

Files that contain thousands of lines of imperative instructions with documentation like "Per researcher X" every 100 lines or so.

Code that runs fine for 6 hours, then stops because some command-line input had the wrong value.

I've seen all of these over the years. Even as a domain expert who has spoken directly with authors and project leads, this kind of stuff makes it very hard to tease out what the code actually does, and how the code corresponds to the papers written about the results.

1 comments

You’re giving me flashbacks! I spent a year as an admin on an HPC cluster at my university building tools/software and helping researchers get their projects running and re-lead the implementation of container usage. The amount of scientific code/projects that required libraries/files to be in specific locations, or assumed that everything was being run from a home directory, or sourced shell scripts at run time (that would break in containers) was staggering. A lot of stuff had the clear “this worked on my system so...” vibe about it.

As an admin it was quite frustrating, but I understand it sometimes when you know the person/project isn’t tested in a distributed environment. But when it’s the projects that do know how they’re used and still do those things...