|
>There can be a big opportunity cost in trying to rework a workflow so that it is more efficient and then test it thoroughly ensure correctness. Hi, I recognize your name as a legit bioinformatician, am a huge fan of the lab that you're currently in, and others should listen to you. I'd like to add that for many projects, general reusable software engineering is not necessarily a huge advantage. Instead of verifying a single implementation, it's often better for somebody to reimplement the idea from scratch; if a second implementation in a different language written by a different programmer gets the same results, this is a much more thorough validation of the software than going over prototype software line by line. Also, I've seen way too many software engineers come in with an enterprisey attitude of establishing all sorts of crazy infrastructure and get absolutely no work done. If Java is your idea of a good time, it's unlikely that you'll be an effective researcher (though it's not unheard of), because it's not good at maximizing single-programmer output, and not good at maximizing I/O or CPU or string processing. In research it's best to get results, fail fast fast fast, and move on to the next idea. If you're lucky, 1 in 20 will work out. Publish your crap, and if it's a good idea, it will be worth polishing the turd later, but it's better to explore the field then to spend too much time on an uninteresting area. The only time you worry about efficiency is when it enables a whole other level of analysis. So, for example, UCSC does most of their work in C, including an entire web app and framework written in C, because when they were doing the draft assembly of human genome a decade ago on a small cluster of computers that they scrounged from secretaris' desks over the summer, Perl wouldn't cut it. |
Reproducible code: extremely important. Correct code: extremely important. Readable code: very important. Efficient code: often not as important.
Even today, the UCSC Genome Browser is an example where efficient code is important. It is interactive software, has many human users who can work much efficiently when the browser is responsive. And with projects like ENCODE, there are now incredible amounts of data available from the browser that would not be easily possible with a less efficient system.
Very different from an analysis system that will be run a handful of times in batch mode.