Hacker News new | ask | show | jobs
by tea-coffee 768 days ago
This is a basic question, but how is the accuracy of the predicted biomolecular interactions measured? Are the predicted interactions compared to known interactions? How would the accuracy of predicting unknown interactions be assessed?
1 comments

Accuracy can be assessed two main ways: computationally and experimentally. Computationally, they would compare the predicted structures and interactions with known data from databases like PDB (Protein Database). Experimentally, they can use tools like x-ray crystallography and NMR (nuclear magnetic resonance) to obtain the actual molecule structure and compare it to the predicted result. The outcomes of each approach would be fed back into the model for refining future predictions.

https://www.rcsb.org/

AlphaFold very explicitly (unless something has changed) removes NMR structures as references because they are not accurate enough. I have a PhD in NMR biomolecular structure and I wouldn't trust. the structures for anything.
Sorry, I don’t mean to be dense - do you mean you don’t trust AlphaFolds structures or NMRs?
I don't trust NMR structures in nearly all cases. The reasons are complex enough that I don't think it's worthwhile to discuss on Hacker News.
Hmm, I would say its always worth to share knowledge. Could you paste some links or maybe type a few key-words for anyone willing to reasearch the topic further on his own.
Read this, and recursively (breadth-first) read all its transitive references: https://www.sciencedirect.com/science/article/pii/S096921262...
Looking at the supplementary material (section 2.5.4) for the AlphaFold 3 paper it reads to me like they still use NMR structures for training, but not for evaluating performance of the model.
I think it's implicit in their description of filtering the training set, where they say they only include structures with resolution of 9A or less. NMR structures don't really have a resolution, that's more specific to crystallography. However, I can't actually verify that no NMR structures were included without directly inspecting their list of selected structures.
I think it is very plausible that they don't use NMR structures here, but I was looking for a specific statement on it in the paper. I think your guess is plausible, but I don't think the paper is clear enough here to be sure about this interpretation.
Yes, thanks for calling that out. In verifying my statement I actually was confused because you can see they filter NMR out of the eval set (saying so explicitly) but don't say that in the test set section (IMHO they should be required to publish the actual selection script so we can inspect the results).
interesting observation and experience. must have made thesis development complex, assuming the realization dawned on you during the phd.

what do you trust more than NMR?

AF's dependence on MSAs also seems sub-optimal; curious to hear your thoughts?

that said, it's understandable why they used MSAs, even if it seems to hint at winning CASP more than developing a generalizable model.

arguably, MSA-dependence is the wise choice for early prediction models as demonstrated by widespread accolades and adoption, i.e., it's an MVP with known limitations as they build toward sophisticated approaches.

My realizations happened after my PhD. When I was writing my PhD I still believed we would solve the protein folding and structure prediction problems using classical empirical force fields.

It wasn't until I started my postdocs, where I started learning about protein evolutionary relationships (and competing in CASP), that I changed my mind. I wouldn't say it so much as "multiple sequence alignments"; those are just tools to express protein relationships in a structured way.

If Alphafold now, or in the future, requires no evolutionary relationships based on sequence (uniprot) and can work entirely by training on just the proteins in PDB (many of which are evoutionarily related) and still be able to predict novel folds, it will be very interesting times. The one thing I have learned is that evolutionary knowledge makes many hard problems really easy, because you're taking advantage of billions of years of nature and an easy readout.

Would you trust the CryoEM structures more?
yes, albeit with significant filtering.
Nice to see you on this thread as well! :)