Hacker News new | ask | show | jobs
by jamesli 4188 days ago
Biology is enormous, as the author pointed out. Our understanding in biology at present might be less than 1% than the whole knowledge. For most biological research, it doesn't require any advanced knowledge in math beyond basic statistics.

For example, many labs have been studying an important gene and how other genes are functionally related to it for more than 10 years. The research involved are simply tedious, but indispensable, biological experiments. It is a waste of time to study math for this work, because it doesn't apply, except some basic statistics on data analysis.

Disclaimer: I have advanced degrees in both biology and computer science, and has multiple years of biomedical research experience. I have met exceptionally smart people working in both biology, CS, math, and physics. Yes, these smart biologists don't understand advanced topics in math and physics. I believe, however, if they had studied math or physics, they would have been excellent mathematicians or physicists.

3 comments

IANAB. From what I understand, DNA research seems to have lots of still low hanging fruits for simple mathematical models to achieve big breakthroughs. Yamanaka won a nobel price for cell reprogramming that simply came from neglecting the previous brute force method to find the correct molecule combination that lead to many years of Biologists trying out combination after combination. Instead he basically deduced it through simple modelling and applying the scientific method. A quick google has brought the following thesis, which bases some more modelling on Yamanaka in order to refine stem cell reprogramming [1]. From my point of view, the really outstanding work in biology currently rather comes from outsiders that break out of the usual methods of biologists, such as the applied mathematician Erez Lieberman Aiden who showed how genome folding works and actually has an important function (activating / deactivating regions in order to program cell functions), purely through mathematical modelling of the signals we can get out of current instruments and throwing HPC at it. I'm pretty sure the field would benefit greatly from more cross pollination from other fields.

[1] https://www2.hu-berlin.de/biologie/theorybp/docs/dipl_scharp...

[2] http://www.sciencemag.org/content/326/5950/289.short

Well said.

The study of biology is further complicated by the large number of confounding factors that muddle experimental results. Because of this, it is hard to know exactly when it is appropriate to bring in mathematics. Without a proper understanding of all the variables, math can only get you so far.

Well,

Another way to consider this is that biologists have not pushed back hard enough to mathematicians in the sense of asking for some tools which would allow for just slurping up a vast amount of unstructured, unprocessed data and getting something out of it.

It is certainly true that mathematical modeling as it is done now currently will indeed only get you so far.

But spirit of math in conjunction with physics has been to create tools that allow leaps and bounds. If we want to follow that spirit, it seems appropriate to ask for tools to help with messy things that now can't easily be dealt with. It may not be possible but it seems worthwhile to go all the way to the brick wall and pound on it.

Unfortunately biologist have been taken for a ride many times by people selling mathematical snake-oil. For example, the whole field of DNA microarrays [1] turned out to be an illusion woven out of applying complex statistical tools to "vast amount of unstructured, unprocessed data". There really is no way you can gain real understanding from poorly designed and unrepeatable experiments by apply obscure mathematical tools.

1. http://en.wikipedia.org/wiki/DNA_microarray

Luckily, complex numbers and polynomials are not snake oil.
Wouldn't it be entirely useful for biologists to know more advanced statistical methods?
It is, for sure.

There are some other points to consider. First, the dataset sizes for most biomedical research are very small. Most advanced statistical methods don't apply. Due to curiosity, I took some advanced stat courses and tried to apply the methods to our lab's data. It didn't provide any significant improvement compared to basic ones, like linear regression, logistic regression, etc.

Second, biomedical research is highly collaborative these days. For some research that generate a large amount of data, either the researchers themselves understand statistics very well, or they collaborate with statisticians very closely. There is a field called biostatistics. Most biostatistics professors are either math or stat major, and many of them are adjoint professors in biomedical departments.

Biomedical research is really tedious and time-consuming. The professors I knew when I was doing biomedical research worked more than 60 hours a day, and they wish they had more time. One young woman professor came to the lab at 8am, left at 6pm, spent some time with her 4 children, and came back to lab at 9pm again, and worked until midnight, on every weekday. She brought her children to the lab on Saturday, and worked the whole day. IMHO, it is better for her to focus all her energy on the biomedical part, which she is best at, and collaborate with statisticians.

This is a very important point I think - only in very rare cases have I found my research actively improved by having a more sophisticated method available, and most projects have a statistician as a collaborator already. If not, they're readily available. It benefits a biologist to know what the statistician is talking about, and not just treating the analysis as a black box, but there's a reason we have subject matter experts. Sometimes, someone saying "Make sure to use robust variance" is enough information.
It depends on what research they're doing. It's also quite easy to be led astray and produce poor work by trying to throw the newest, shiniest thing at something when a much more basic technique will do.

For example, for much of the work I do, you could get away with never using anything more sophisticated than ANOVA.

I take the opposite stance - if biologists knew about advanced statistical methods they might be tempted to use them.

The general rule in biology is if you need to use statistics you did the wrong experiment. The reason for this rule is it is all too easy to use clever statistical methods to solve a flawed experimental design.

It should be noted that "Biology" also encompasses fields where you are limited to uncontrolled observational experiments, which often necessitate more advanced methods.
True, but in this case you should have chosen another field :)
I agree that more advanced statistical methods would be useful. A surprising number of scientists have poor knowledge about statistics whilst being dependent on it to prove their research.
If anyone is depending on statistics to "prove" anything, they're in trouble.
How do you prove that a medicine is safe and effective if not through large scale studies, which you then use statistics to show whether your hypotesus was correct or not?
With statistics I can definitely prove that something holds with a confidence, of, say 0,95, under the model assumptions [...].
If you mistake the p-value on a frequentist hypothesis test for a Bayesian posterior probability, you've already messed up.
Touche, 95% of people who prove things on the internet use statistics.