Hacker News new | ask | show | jobs
Ask HN: computational chem/bio
8 points by quantize 5686 days ago
Fellow HN'ers:

I'm in university right now, and I'm looking to work with some professors in the computational sciences. I'm wondering if you guys know of any cool intro projects I could hack on before approaching professors. I'm really just looking for suggestions, because I can't think of a good place to start. Thanks!

6 comments

It's such a big area that it's hard to know where to start. Have you taken an intro class or something? That might be the best way to figure out what's out there.

I'm not sure what your computational background is but if you're not familiar with the basics like sequence alignment I suggest you check out a book. I'd recommend An Introduction to Bioinformatics Algorithms by Neil C. Jones (http://www.amazon.com/Introduction-Bioinformatics-Algorithms...). In addition to describing the algorithmic techniques, it gives synopses of current research issues and big names in the field. There are also hundreds of sample problems - some of which are active research problems - and no answer key (because often these types of problems don't have clear "best" answers). It's kind of like being taught how to swim by being thrown in a lake. If you survive, you're better off for it.

Also, before approaching professors it helps to have read and understand their description of their research interests and maybe have read a few of their recent papers. This will give you an idea of whether they're doing something that you might be interested in or not.

And then there's quantum mechanics, molecular modeling, microarray analysis, protein folding, drug design, structure visualization, and still more.

If you want good initial projects, try one of these: 1) make a structure viewer (to start, read a PDB file and draw colored spheres in some sort of UI), 2) implement a sequence alignment then use it infer the family tree from a set of related sequences, 3) build a web site which lets someone do full-text or other complex searches of UniProtKB/Swiss-Prot. When you display a record, also render the sequence annotations graphically.

I have a masters in computational biology. I did some work on flux balance analysis (FBA), which is a pretty interesting topic.

FBA simulates the metabolic system using linear programming, which is a technique used most extensively in economics, but applicable to biology. Basically, linear programming is a mathematical method of determining the maximization of some linear function given a set of linear contraints.

The assumption behind applying FBA to metabolic systems is that the cell acts to maximize its growth based on what is available in the environment and the stoichiometry of the metabolic reactions that can occur in the cell (ie the constraints of the system).

It is suprisingly accurate at predicting things like the consequences of metabolic gene knockouts, and has been applied to identify potential drug targets.

The most renowned researcher is this area is Bernhard. O. Palsson. His group[1] has created computational models of different organism that can be used to perform FBA and test things such as, for example, the outcome of gene knockouts. His models are available to download.

There are linear programming libraries available for linux. I used lpsolve, which has Python bindings. As a starter project you could do something like identify the essential genes in a organism like E. coli.

I'd be happy to help. My homepage (in my profile) has my email address.

[1] http://gcrg.ucsd.edu/

Don't wait to approach professors. Do it tomorrow. Each of them will have 10,000 ideas and will be glad to help you work on one of them.
Agreed 100%, that's what worked for me. "I've got this molecule here... [he then described it a bit, showed me the physical metal model he'd made of it (from a more permanent variety of the plastic models people use in organic classes)], and the current approaches are inadequate..." and then he lent me a book that would get me started on learning what I needed to know to approach the problem.

Basically, if you can largely offer a "fire and forget" proposition, i.e. you need a problem and pointers to learning but not a lot of hand holding, you should be able to find a professor who has one or more things he'd like to investigate but that he doesn't have the resources of one sort of another to do, and that you can offer your particular talents towards solving.

Any specific types of projects you are interested in? Like bendmorris says, this is a very vast field. Some pointers that might help

Genomics:

Take a look at work that Mike Schatz is doing at CSHL: http://schatzlab.cshl.edu/

Or you can check out this recent webcast by C. Titus Brown: http://oreillynet.com/pub/e/1784

On the comp chem side, check out Rajarshi Guha's blog. You don't get much better on the cheminformatics side of things: http://blog.rguha.net/

There's a lot of folks doing some very interesting research in genomics, proteomics, cheminformatics, metagenomics, etc. You might also want to ask this question at BioStar: http://biostar.stackexchange.com

Thanks a lot for the links!
From your post I can't tell if you come from a biochemistry or a computational background.

If you have a biochemistry background: I'd recommend that you use your strengths, and maybe concentrate on something that will help in the wetlab. The biological background required for algorithmic problems such as the statistics of protein and dna sequences is a well trodden path for people who know computer science. However, we know nothing about the wetlab and most of the software I see there is ugly, closed and expensive.

If you have a computer background: One suggestion is to install R, Bioconductor (bioconductor.org), and start playing with microarray data in the form of CEL files (http://www.ncbi.nlm.nih.gov/gds/).

I have a strange background, more of a CS background then bio, so this is a great suggestion. Thanks!