Hacker News new | ask | show | jobs
by momeara 3531 days ago
I think one of the biggest potential for molecular autoencoders is that they can be used to generate inputs for virtual high throughput screening campaigns to predict new drugs. The idea would be to train models to predict compounds that can be evaluated with more physically realistic molecular docking simulations --> in vitro activity assays --> animal models --> and then clinical trials as it goes through the pipeline.

Here is an example from our lab using virtual screening to develop PZM21 to treat pain [1]. where we screened 3M compounds. We would have liked to have screened 10^6 fold more compounds to cover easily synthesizable chemical space in this as well as other campaigns, but that is currently computationally infeasible. If molecular autoencoders could help us more efficiently screen this space, it would be huge.

I'm co-organizing a free, 1-day workshop for deep learning for chemoinformatics at Stanford Nov 11th. We've got ~75 mostly computational chemistry researchers coming. I would love to have more machine learning researchers come as well. The website is deepchemworkshop.docking.org, or PM if any of you think you may be interested.

[1] Manglik, et al. Structure-based discovery of opioid analgesics with reduced side effects (doi:10.1038/nature19112)

3 comments

Several people have asked for background material for the workshop--

(Wallach, 2015, http://arxiv.org/pdf/1510.02855.pdf) AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery

(Duvenaud, 2015, http://papers.nips.cc/paper/5954-convolutional-networks-on-g...) Convolutional Networks on Graphs for Learning Molecular Fingerprints

(Kearnes, 2016, http://arxiv.org/abs/1606.08793v1) Modeling Industrial ADMET Data with Multitask Networks

(Gómez-Bombarelli, 2016, doi:10.1038/nmat4717) Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach

and of course

(Gómez-Bombarelli, 2016, https://arxiv.org/abs/1610.02415) Automatic chemical design using a data-driven continuous representation of molecules

Someone with PZM21 knowledge! Intriguing! Would there be any point to using low-granularity approximations of 'disjoint-class-ish' molecular backbones and building upon best candidates, doing some kind of low-res hill-climbing in effect, before increasing granularity with the best ones?

Granularity might be some kind of "well we need some kind of phenol or phenol-derived ring here, why not just replace that with some sort of representation of 'phenol-like-ring-here'" or something to that effect.

Also, about PZM21 -- will it ever experience the same fate of U47700, or the likes of the orphaned opioids resurfacing from their watery graves?

In theory I think you are right--there should be a tower of representations from low-res/fast to high-res/slow. Though in practice it has been hard to make multi-resolution modeling work together. For example for proteins, where the backbone is much more regular than small molecules Rosetta has "centroid mode" and "full atom mode". There is also MM/QM models where just the active site is modeled with higher level of theory representation.

For virtual screening it is possible to speed things up by say not taking into account receptor flexibility or ignoring explicit interactions with water.

As for lower resolution representations of small molecules, there is ROCS[1] and friends which represents small molecules with a set of gaussians.

One of challenges with low-resolution representations is that the aims of virtual screening is often to find novel backbones that may interact with the protein. So any low-resolution representation should mix different backbones into the same cluster, but finding such a representation is difficult, given the diversity of small molecules.

As for U47700, finding the mechanism of action for drugs that treat complex processes like pain is quite difficult. Also small molecules often interact with numerous targets so deconstructing how it works is non trivial. Part of the motivation for PZM21 is to try to separate out the downstream effects of hitting the mu-opioid receptor as a "biased" ligand. I think PZM21 with its new scaffold will help disentangle the effects of classical opioids.

[1] https://www.eyesopen.com/rocs

Any concerns that PZM21 will be an even 'better' designer drug than U47700, O-DSMT, MPP, hell even heroin? Especially due to the adversarial nature of clandestine chemists and their respective nations' law enforcement agencies. Then again, taking a peek at PZM21's shape, good luck out there to all the non-sigma aldrich tier chemists who want to make their own, lol.
Hey are there any datasets related to this stuff publicly available? It would be awesome to put this up on kaggle and let people compete to find the best model.
There are some, but more effort is needed.

http://deepchem.io/ is trying to set up standard data sets for chemoinformatics/machine learning.

ChEMBL and PubChem are the big public repositories though some care must be taken in curating data from these for machine learning.