| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dekhn 768 days ago
	AlphaFold very explicitly (unless something has changed) removes NMR structures as references because they are not accurate enough. I have a PhD in NMR biomolecular structure and I wouldn't trust. the structures for anything.

5 comments

JackFr 768 days ago

Sorry, I don’t mean to be dense - do you mean you don’t trust AlphaFolds structures or NMRs?

link

dekhn 768 days ago

I don't trust NMR structures in nearly all cases. The reasons are complex enough that I don't think it's worthwhile to discuss on Hacker News.

link

fikama 768 days ago

Hmm, I would say its always worth to share knowledge. Could you paste some links or maybe type a few key-words for anyone willing to reasearch the topic further on his own.

link

dekhn 768 days ago

Read this, and recursively (breadth-first) read all its transitive references: https://www.sciencedirect.com/science/article/pii/S096921262...

link

fabian2k 768 days ago

Looking at the supplementary material (section 2.5.4) for the AlphaFold 3 paper it reads to me like they still use NMR structures for training, but not for evaluating performance of the model.

link

dekhn 768 days ago

I think it's implicit in their description of filtering the training set, where they say they only include structures with resolution of 9A or less. NMR structures don't really have a resolution, that's more specific to crystallography. However, I can't actually verify that no NMR structures were included without directly inspecting their list of selected structures.

link

fabian2k 768 days ago

I think it is very plausible that they don't use NMR structures here, but I was looking for a specific statement on it in the paper. I think your guess is plausible, but I don't think the paper is clear enough here to be sure about this interpretation.

link

dekhn 768 days ago

Yes, thanks for calling that out. In verifying my statement I actually was confused because you can see they filter NMR out of the eval set (saying so explicitly) but don't say that in the test set section (IMHO they should be required to publish the actual selection script so we can inspect the results).

link

fabian2k 768 days ago

Hmm, in the earlier AlphaFold 2 paper they state:

> Input mmCIFs are restricted to have resolution less than 9 Å. This is not a very restrictive filter and only removes around 0.2% of structures

NMR structures are more than 0.2% so that doesn't fit to the assumption that they implicitly remove NMR structures here. But if I filter by resolution on the PDB homepage it does remove essentially all NMR structures. I'm really not sure what to think here, the description seems too soft to know what they did exactly.

link

panabee 768 days ago

interesting observation and experience. must have made thesis development complex, assuming the realization dawned on you during the phd.

what do you trust more than NMR?

AF's dependence on MSAs also seems sub-optimal; curious to hear your thoughts?

that said, it's understandable why they used MSAs, even if it seems to hint at winning CASP more than developing a generalizable model.

arguably, MSA-dependence is the wise choice for early prediction models as demonstrated by widespread accolades and adoption, i.e., it's an MVP with known limitations as they build toward sophisticated approaches.

link

dekhn 768 days ago

My realizations happened after my PhD. When I was writing my PhD I still believed we would solve the protein folding and structure prediction problems using classical empirical force fields.

It wasn't until I started my postdocs, where I started learning about protein evolutionary relationships (and competing in CASP), that I changed my mind. I wouldn't say it so much as "multiple sequence alignments"; those are just tools to express protein relationships in a structured way.

If Alphafold now, or in the future, requires no evolutionary relationships based on sequence (uniprot) and can work entirely by training on just the proteins in PDB (many of which are evoutionarily related) and still be able to predict novel folds, it will be very interesting times. The one thing I have learned is that evolutionary knowledge makes many hard problems really easy, because you're taking advantage of billions of years of nature and an easy readout.

link

carlsborg 767 days ago

Would you trust the CryoEM structures more?

link

dekhn 767 days ago

yes, albeit with significant filtering.

link

heyoni 768 days ago

Nice to see you on this thread as well! :)

link