Hacker News new | ask | show | jobs
by heycosmo 2020 days ago
> I don't think I fully understood this, but I'll give it a shot anyway. If your artificial sequence aligns with others, there's a chance that it will fold like them, depending on the quality and accuracy of the multiple sequence alignment. Since multiple sequence alignments are built under the assumption of homology (all sequences have a common ancestor), it's a matter of how far from the "sequence sampling space" your sequence is located compared to the others.

I understand that similar sequences may fold similarly (although as length increases, I highly doubt it, but IDK). I'm talking about aligned sub-sequences within one chain and their ultimate distance from each other in the final structure. Co-evolution suggests that aligned sub-sequences are also proximal. But manufactured chains did not evolve, therefore the assumption is no longer useful.

1 comments

Oh, I see! Yes, an intrachain alignment of an artificial sequence does not by itself give any information about co-evolution, especially since you don't know whether your protein is actually folding. To assess co-evolution you need a multiple sequence alignment between protein homologs containing correlated mutations.

> I understand that similar sequences may fold similarly (although as length increases, I highly doubt it, but IDK).

As long as the sequence similarity is kept between those sequences, length is not an issue.

> Co-evolution suggests that aligned sub-sequences are also proximal

What do you mean by "proximal"? Close in space, or similar in structure?

> To assess co-evolution you need a multiple sequence alignment between protein homologs containing correlated mutations.

That makes sense. So in the CASP competition, when teams are given a sequence, do their algorithms do something like the following?

1. Search database for homologs of given sequence 2. Look at MSA and correlated mutations of homologs 3. Look for similar correlated mutations in given sequence

I imagine 1-3 could somehow be embedded in a NN after training on a protein database.

> What do you mean by "proximal"? Close in space, or similar in structure?

I mean close in space.