|
Great questions! The typical input to the neural network is the 3D structure of the molecule and of the protein. The model works by detecting patterns in the pair of protein and the drug that correlate with binding, e.g. hydrogen bonding, halogen bonding, cation-pi, pi-pi interactions, etc. But these are complicated to encode manually, given all of the factors that affect binding strength: distance, angle, water mediated effects, resonance, (de)stabilizing environmental charges, etc. That's why we need the neural net: you can think of it as the network automatically deriving the best pharmacophoric features to maximally explain which training examples bind and which ones don't, and then the prediction step is looking for the presence or absence of those patterns in new protein-ligand pairs. We evaluate our models both retrospectively and prospectively. For example, the DUD-E benchmark (http://dude.docking.org/) gives us an assessment of our performance over more than a million individual predictions, comprising many diseases and many biological classes (GPCR, nuclear receptor, enzyme, etc). It begins with 102 disease proteins and, for each one, has a set of molecules that bind to the protein and a set that don't. We shuffle those sets together and ask the neural net to "pick the aces out of the deck". Separately, we perform prospective evaluations, for settings where no one knows the right answer, and run the experiment to confirm the predictions. I agree with you that the proper selection of targets is critical, as is the mapping between drug target and disease. For us, however, this is easy: we work with smart biologists! If you have any, please send them our way! Finally, I agree with your point that biology is not designed to be understood by people. That said, molecular binding is fundamental enough that we could think of it as an example of physics rather than biology. And theory works so well for physics that, in many a physics lab, if an experiment disagrees with theory then the first step is to double-check the experiment for errors. The trick is to scale that up to larger systems. Semi-relevant: http://www.smbc-comics.com/?id=2272 |