| Molecular dynamics simulations can be used to answer a range of structural biology questions, but abstractly many of them can be phrased as evaluating the difference in free energy between different conformational states. In molecular dynamics this is done by thermodynamic integrating the energy of over the state space volume for each of the conformational states. An alternative approach is to directly map conformational states to their free energy. This leads to a problem of searching for candidate conformational states (e.g. the folded state, transition states etc.) and scoring them. Usually for a given computational budget there is a trade off between better conformational sampling or higher accuracy energy scoring. Historically, searching and scoring methods have been designed separately. For example [1] improves sampling while [2] improves energetics. This is done because they historically involved different aspects of the simulation and each is lot of work. But searching and sampling are not really separable, in that the deeper one samples the more challenging the task of the scoring function becomes--discriminating stable from unstable conformations. Another application that can be thought of as searching and scoring is the game of GO. My impression is that one of the major breakthroughs with AlphaGo is that they were able to integrate models for searching and scoring together and learn the models simultaneously. It would be awesome if similar architectures could be applied to molecular modeling. A remaining challenge in applying GO models to molecular biology is that while the representation and scoring rules for GO are fixed and quite easy, the ground truth for molecular simulations comes from heterogenous experimental data (X-ray crystal structures, small molecule activities, directed evolution antibody screens etc.) and higher levels of theory QM simulations, which have their own challenges. However, I think the principles carry over--complicated scoring functions (e.g. free energy) over large state spaces (e.g. protein conformation space or chemical space) can be learned by combining models for searching and scoring. I think deep learning is poised to tackle these problems. [1] (Conway, et al., 2013, DOI: 10.1002/pro.2389)
Relaxation of backbone bond geometry improves protein energy landscape modeling [2] (Park, 2016, PMID: 27766851)
Simultaneous optimization of biomolecular energy function on features from small molecules and macromolecules. |
The search space is vast in GO, but it inevitably shrinks over the game, where as in MD simulations it does not shrink, proteins can fold and unfold. There are a fixed number of possible legal play positions to play in GO, but the legal moves for protein conformation fluctuates wildly, is governed by physics (which you would need to relearn), and likely to be much larger than GO since it's continuous. In simulations, you care about successive moves, where as AlphaGO does not care about time-dependent properties (there are also kinetic observables, like folding rates that seem non-intuitive to evaluate without simulations). Even if you sampled enough conformations on some pathway, perhaps some sort of allosteric change, how would you know how fast it happens? In GO, you always play the same game, but in simulations, you often play different games, i.e, you don't want to be unfolding your protein when you are studying ligand binding. In a similar vein, imagine a single-point mutation that causes protein misfolding. It seems to me that you'd need to retrain your search/score algorithm for each new protein sequence, which doesn't seem like you're saving much time/complexity. There is also a huge problem in scale. We're talking about proteins varying from hundreds to hundreds of thousands of atoms/dihedrals/contacts, not to mention sampling water in the active sites of druggable proteins.
I think it could work in principle, but a physics-based approach sure seems elegant by comparison.