Hacker News new | ask | show | jobs
by mgillett 4882 days ago
I think this is more a comment on how the system is broken. Researchers should be notified of new libraries in their area. At the very least, they should be able to consult a single site that everyone uploads their code to (think Github for science with more emphasis on exploration). Academic journals are not the only channels carrying useful information.
1 comments

It's not the discoverability of the libraries that's the problem, it's that the utility of these libraries is generally not that great for anyone except the authors. One common type of library handles data transformation, normalization, and maybe even workflows. These abound. But they are rarely useful in other people's hands, because to extend them and actually get any work done, you need to spend as much time learning them as it would take to write it from scratch. And the advantage of writing it from scratch is that you know it intimately, and all of its assumptions and flaws, which you don't know about somebody else's code, even if it's extremely well documented. Take something like Taverna [1], which is probably very useful to some people, and had been recommended enthusiastically to me by many people, but after spending three hours reading documents and searching the web, I could not get it to do what I needed to do, so I wrote a simple one-off bash script that interfaced with our cluster system. Alternatively I could try to hack in loops, but that's going to take me 10x as long, will require me to interact with many other people who obviously don't understand my problem since they did not consider it a fundamental need, and may not even be accepted back into the mainline, at which point I'm off on my own fork and lose the benefit of using a common code base. Waiting 1-10 hours to hear back from the dev mailing list is unacceptable when you're trying to get work done.

Is it more important to get the result, or to use other people's code? Reinventing the wheel is a minor sin compared to not getting results.

[1] http://www.taverna.org.uk

I just think that very much depends on the field and the problem domain. Taverna seems like it's more targeted towards academics that don't know how to code, and that most people that use it are comfortable staying within its limits. I mean, you definitely are going to have a level of project specificity that is much higher than say, that found in the web development world. In science, many people are searching for the existence of new problems, not just the answers. Why build a gem for email integration if the next best method of communication will likely come out next week? The problem with this thinking is that it perpetuates itself. I don't write the library that only you would find useful because I don't think it's worth my time. In return, I never receive anything useful because everyone else has adopted that same mindset. As some others pointed out, I think the problem rests in the lack of best practices and poor comp sci education among researchers. Teach proper library construction and test-driven philosophy, and I think you'll see a lot more people become comfortable writing and publishing libraries. Cobble together some basic documentation, keep an eye on its use, and contribute more accordingly. You're never going to escape writing custom scripts, but there are more well-defined problems out there that could use standard solutions.