Hacker News new | ask | show | jobs
by trott 618 days ago
> There are few approaches that will accelerate the field of drug development and chemistry as a whole in a way that the works of these three people will.

As the author of one such approach, I'm skeptical.

AlphaFold 2 just predicts protein structures. The thing about proteins is that they are often related to each other. If you are trying to predict the structure of a naturally occurring protein, chances are that there are related ones in the dataset of known 3D structures. This makes it much easier for ML. You are (roughly speaking) training on the test set.

However, for drug design, which is what AlphaFold 3 targets, you need to do well on actually novel inputs. It's a completely different use case.

More here: https://olegtrott.substack.com/p/are-alphafolds-new-results-...

1 comments

Protein structures are similar to each other because of evolution (protein families exist because of shared ancestry of protein coding genes). It's not a weird coincidence that helps ML; it's inherent in the problem. Same with drug design -- very, very, few drugs are "novel" as opposed to being analogues of something naturally in the body.
They're referring to the structure of the protein when a drug is bound, that's what's novel. Novel as in, you can't think of it as "just" interpolation between known structures of evolutionarily related proteins.

That said I'm not sure that's entirely fair, since Alphafold does, as far as I know, work for predicting structures that are far away from structures that have previously been measured.

You're quite wrong about small molecule drug structures. Historically that has been the case but these days many lead structures are made by combinatorial chemistry and are not derived from natural products.

> Alphafold does, as far as I know, work for predicting structures that are far away from structures that have previously been measured.

It did very poorly at this last time I checked. Maybe AlphaFold3 is better?

But even drugs made by combinatorial chemistry still generally end up being analogues of natural products even if they aren't derived from them. As Leslie Orgel said "Evolution is cleverer than you are"; chemists are unlikely to discover a mechanism of action that millions of years of evolution hasn't already found.
I... Don't think that's right? Although I would appreciate being corrected with some good sources on this. It's a fast moving field and combinatorial chemistry is still new enough that many recently published structures wouldn't have used it.

I'm well aware of the impact of natural products and particularly plant secondary metabolites in drug discovery. I'm also aware of combinatorial synthesis occasionally hitting structures that are close to natural products.

But from first principles, why would you need to limit yourself to that subset of molecular space?

Obviously, your structure will need to look vaguely biochemical to be compatible with the bodies chemical environment, but natural products are limited to biochemically feasible syntheses, and are therefore dominated by structures derived from natural amino acids and similar basic biochemical building blocks.

For a concrete example off the top of my head, I'm not aware of any natural diazepines - the structure looks "organic" but biochemistry doesn't often make 7-rings, and those were made long before combinatorial chemistry. Might be wrong on this one, since there's so much out there, but I think it holds.

Perhaps we are using "structure" in different senses. Yes, it is possible to generate a molecule with a chemical structure unlike any biological molecule and have it bind to a protein, but it can only do so if its 3D structure is analogous to what naturally binds there. Natural products are a source of drugs because evolution has already done this work for us.
https://en.wikipedia.org/wiki/Functional_analog_(chemistry) explains the difference between structural and functional analogs: fentanyl is quite dissimilar from morphine, but binds the same targets.
> It's not a weird coincidence that helps ML; it's inherent in the problem.

This depends on the application. If you are trying to design new proteins for something, unconstrained by evolution, you may want a method that does well on novel inputs.

> Same with drug design

Not by a long shot. There are maybe on the order of 10,000 known 3D protein-ligand structures. Meanwhile, when doing drug discovery, people scan drug libraries with millions to billions of molecules (using my software, oftentimes). These molecules will be very poorly represented in the training data.

The theoretical chemical space of interest to drug discovery is bigger still, with on the order of 1e60 molecules in it: https://en.wikipedia.org/wiki/Chemical_space