Hacker News new | ask | show | jobs
by trott 646 days ago
I'm the author of AutoDock Vina (the most cited docking program, and the "runner-up" in the AlphaFold 3 paper)

Docking software is used to scan millions and billions of drug-like molecules looking for new potential binders. So it needs to be able to generalize, rather than just memorize.

But the evaluation approach used here and in the original paper (1) does not test how well the software will perform on novel molecules, because the test set is related to the training set.

If you understand the basics of ML and physics, you may be interested in my detailed critique here: https://olegtrott.substack.com/p/are-alphafolds-new-results-...

I'm glad that Chai-1 has been released though, as this will probably help people evaluate the method better.

(1) It looks like they are a bit different, as this paper allows 40% sequence identity. It's still high. I believe that sequences with 40% identity tend to have the same shapes, especially in the binding site, where it matters.

1 comments

Thanks for your work and also for your comments of AF3 and Chai-1. It sounds like you are implying there are potentially gross and subtle types of data set leakages taking place between the train and test which are resulting in what seem to be inflated performance metrics? These are pretty serious issues if so. Also I would agree with previous authors that marginal Improvement over sota is proof more that they have recreated something than really made significant new progress. But this has been an issue with LLMs for sometime now. But it sounds like they have some bright engineers from good brand name companies who are coming together with some VC backing of the team to try and do something in this space. I do appreciate that the weights are open. I would like to learn more about their future direction and their training methods