Hacker News new | ask | show | jobs
by c1sc0 5800 days ago
Perl was the only language on the block with strong built-in text-processing capabilities. For many a biologist the Camel book was the only programming book they read before moving on to solve real biological problems instead of fiddling with programs.
2 comments

It's also performant enough that it wasn't worth the time to learn a faster performing language.
Without going off on a limb: back then, if you knew Perl, you knew C.
I learnt Perl and used it on projects long before I learnt C.
From the bit of Bioinformatics work I did in college, that actually seemed a problem rather than a boon. Genes are not ascii sequences, and Perl is not really made to manipulate them.

Perl is however an excellent scripting language, and though it's ugly, it's just as easy as Python to pick up and use. Thus its danger.

Can you elaborate a bit on "genes are not ascii sequences"? My understanding is that genes are regularly computer stored as ascii sequences e.g. GATTACA.
Yes. I think I came off as more negative towards Perl than I meant to. The point is that they are far simpler than ascii. They are quaternary data, and as such Regexes aren't really a good tool for modifying them any more than Regexes are a good tool for modifying raw binary, even if your binary is stored as a string of ascii characters.

There are other facilities of Perl though, in general its quick scripting, that make it as good a tool as any other, just so you don't get sucked into the idea that regexes are a good tool here.

Thanks for the reply. That's definitely true. I'd imagine a gene sequencing system using two bit encoding would allow for the gene data to be more compact and more performant in many cases.