Hacker News new | ask | show | jobs
by nabla9 3041 days ago
It would be cool if machine learning researchers would start participating CASP and CAPRI. If you crack Go, you get fame, but if you crack protein prediction, you get Nobel price and completely revolutionize biochemistry and medicine.

http://predictioncenter.org/

http://www.ebi.ac.uk/msd-srv/capri/

edit: Why there is no XPRICE for protein folding?

5 comments

DeepMind and others are trying. "Hassabis said the company is now planning to apply an algorithm based on AlphaGo Zero to other domains with real-world applications, starting with protein folding."

[1] https://www.bloomberg.com/news/articles/2017-10-18/deepmind-...

That doesn't make any sense unless I'm missing something, A0 is suited for a completely different problem than protein folding...
The AlphaZero algorithm (monte carlo tree search with value estimator trained by reinforcement learning) works on any environment you can simulate during play time, single player or not.
Any environment with finite action and state-spaces.
No, the key requirement which makes it difficult to use on real-world tasks is that you must be able to do a forward rollout of your environment in your decision-making process.
FWIW, AlphaGo like algorithms have already been applied to this domain, see AlphaChem
I used to work on protein structure about ten years ago.

Back then, the mood kind of changed from “solve this and you have a Nobel waiting”: the general opinion was that progress was both significant and piecemeal, making it unlikely a Nobel will be awarded because “cracking it”would end up being to hard to assign to any three people.

lol what-I was looking at this list of people who do this-in fact a lot of them ARE machine learning researchers...including some in my department!
I do think however that protein folding is very much understudied in the ML community, relative to say the big three of vision, NLP, and speech. The lack of standardized data sets and benchmarks, not to mention the need for domain knowledge, have made it difficult to get into the field
at the risk of offending NLPers/Vision/Speech I just think those tasks are 'easier' in a variety of ways.
CASP is a pretty nice dataset, so is all of the PDB.
The PDB represents the best we have, but I wouldn't call it a great dataset for learning. The 150,000 known structures are a drop in the ocean when it comes to the space of possible sequences/structures.
It's happening.
I would guess that there have been many attempts to use ML for protein folding. It's one of the most obvious ways to approach the problem.