Hacker News new | ask | show | jobs
by p3ll0n 5798 days ago
In addition to Lincoln's thoughts I think one of the main reasons bioinformaticians are attracted to Perl is because it is forgiving. Biological data is often incomplete, fields can be missing, or a field that is expected to be present once occurs several times (because, for example, an experiment was run in duplicate), or the data was entered by hand and doesn't quite fit the expected format. Perl doesn't particularly mind if a value is empty or contains odd characters. Regular expressions can be written to pick up and correct a variety of common errors in data entry. Of course this flexibility can be also be a curse.
1 comments

A paragraph very similar to this one occurs in the article.
From the article:

"Perl is forgiving. Biological data is often incomplete, fields can be missing, a field that is expected to be present once occurs several times (because, for example, an experiment was run in triplicate) or the data gets entered by hand and doesn't quite fit the expected format. Perl doesn't particularly mind if a value is empty or contains odd characters. Regular expressions can be written to detect and correct a variety of common errors in data entry. Of course, this flexibility can also be a curse, as I'll discuss in more detail later."

A few words are different. The article says triplicate, and p3ll0n says duplicate, for example. But they are similar enough to use as testing input to a diff algorithm.

EDIT: Also from this guy's comment history:

http://news.ycombinator.com/item?id=1456105

Some of the phrasing looks to have been copied and pasted from this article by Jonathan Ellis:

http://www.rackspacecloud.com/blog/2009/11/09/nosql-ecosyste...

I bet if you could make a bot to do this -- go out and find relevant information, and summarize it -- you could actually provide a serious public service. As long as you cited your sources, so it's not a plagiarism-bot.

That bot is easy to write in Perl! I have a document summarizer written already.