Hacker News new | ask | show | jobs
by cing 2303 days ago
So what you're saying is: https://xkcd.com/1831/, except that CS/ML practitioners have a negative impact by trying to contribute without understanding the nuance. I think the next logical question is: how many years of education should you have in order to contribute? 10 years? We'll all be killed by a virus by then :)
1 comments

Hah, I forgot that one. Part of the negative impact is the time spent explaining stuff (like Brooks mythical man month). Another negative impact is that ML folks have gotten really good at hype- paper with slick web page, press release, etc, but the results don't stand up to the claims.

I generally recommend PhD-level study in biology (that's 7 years on top of undergrad) but I think a really smart person could learn most of what's required in 2-3 years if they are in a good lab.

No, we will not all be killed by a virus in the next 10 years; that's just media alarmism. Remember, even if coronavirus becomes a worldwide pandemic, some fraction of people will survive who will be genetically immune. We're much more at risk of wars, climate change, and driving cars.

So, is there a (collection of) book that would save everyone’s time? Asking for a friend.
Personally, I prefer the classic textbook approach, so I recommend Principles of Biochemistry (Lehninger), Biological Sequence Analysis (Durbin, Eddy, Krogh, Mitchison) which is sadly pretty dated now, a general Biology (Campbell), and finally if you really want to dive down the rabbit hole of a complex biological problem with huge health applications, Biology of Cancer (Weinberg).

I've had this argument with folks before and some people seem fine learning in other ways, but I really prefer the textbook approach, especially textbooks which are basically just summaries of the current understanding of the field, with direct links to the detailed review articles.

How many authors on the deep mind paper had biology phds? Are they really just gaming things in an unfair way?
The CEO of deepmind is an author on the paper, his PhD is in biology (but a totally different field, cog neuroscience). The rest of the authors include all the ingredients you'd expect from a modern successful quantitative scientific collaboration: a university professor of Bioinformatics who has a huge prior knowledge of computer-aided protein folding (http://www0.cs.ucl.ac.uk/staff/D.Jones/), several postdoc or post-postdoc level bio/protein experts with knowledge in physical simulation (the method they used ultimately works as distance and angle constraints on the protein structure), as well as a bunch of world-class machine learning/computer science folks.

They're not gaming things. DeepMind is good at games, and CASP is a competition, but everybody who does well at CASP is already doing the same sorts of things that DeepMind did to score well. And they really did come up with a good system that was demonstrably better (I want to give them credit, I just don't think 'breakthrough' is really correct). But one thing I know about CASP (I competed one year) is that after 2 years, whatever the previous winning team did is duplicated by the other top teams, and 2 years after that, everybody can do it.

I think ML is moving protein folding competitions like CASP to be faster now, because you can put your code, training data generator (much of the hard work in protein folding is coming up with good training data), a materialized copy of the training data the generator generates, and a trained model checkpoint on github, so after 2 years, everybody will be able to do what DM did at the previous competition. I think this has been one of the really important improvements in the last few years in protein folding- the computational infrastructure, both the training data, the systems to train on, and the tools to do training, have all gotten much better, and lots of people have gotten good at using them. That's a really promising sign and I hope it takes over more quantitative science.