Hacker News new | ask | show | jobs
by DrScientist 16 days ago
You are missing the point - sure a particular enzyme's function is resilent to large levels of substitution because:

1. The number of residues actively involved in catalysis might be small and 2. Most other residues can be safely replaced with something else either similar if part of the structure or anything if the side chain is pointing out on the surface.

However, the point the article is making is that for different functions the same basic folds seem to be used again and again.

Is that because the stable protein fold structural space is actually small ( due to the limited secondard structure patterns used etc ), or is that because evolution hasn't had time to to search the enormous available structural space?

ie is it a sampling problem or an instrinic property of protein space.

The fact that some of the ML approaches mentioned can now design completely novel folds suggests it is at least partially a sampling problem.

This to me isn't surprising - the idea that evolution is somehow complete and all possible solutions have already been explored seems to me to be unlikely - a lot of evolution happens via gene duplication and then gradual functional drift - which would favour reuse of existing folds over the generation of completely new ones.

3 comments

I have a 30 year old book on protein structure on my shelf. One of the primary themes is the recurrence of the same structural motifs in proteins. The fact that biologic proteins use the same patterns for different functions isn't new information.

The result also fits in with the rest of biochemistry. While there are a vast variety of interesting chemicals in living things, and they do all sorts of amazing stuff, there are really only a handful of classes of chemicals.

The variety of classes of chemicals that can exist dwarfs what gets used in biochemistry. Why would we expect structure to be different?

We're in agreement though, that it would be interesting to understand what the constraints are.

> I have a 30 year old book on protein structure on my shelf. One of the primary themes is the recurrence of the same structural motifs in proteins.

What you have to be careful about here is that the structure that were available 30 years ago were quite strongly biased by what was experimentally tractable.... ie the recurrence of the same folds is in part related to what crystallised well.

> The fact that biologic proteins use the same patterns for different functions isn't new information.

Absolutely. The question is how big is the space - and what percentage of it have we already seen.

> The variety of classes of chemicals that can exist dwarfs what gets used in biochemistry. Why would we expect structure to be different?

Depends on whether the structure universe is specifically a small almost fully explored subset for that very reason. ie biology has choosen a structural subset of possible chemical space by choosing a tiny subset of chemistry.

> What you have to be careful about here is that the structure that were available 30 years ago were quite strongly biased by what was experimentally tractable.... ie the recurrence of the same folds is in part related to what crystallised well.

It was biased in some sense towards those things that could be crystalized, but but at that time we were already seeing the same sorts of recurring motifs with cryo-em which is much less restrictive in the required preparations. (Purify it and flash freeze it.)

In the last 30 years there's nothing that has overturned the recurrence of motifs in protein structure. It's just become more and more established.

This paper confirms that.

The methods they're using are interesting, but the fundamental result isn't surprising.

No one is arguing reuse is a surprise - there used to be a joke in the early days of protein fold prediction - that if the protein amino acid sequence was a certain length - you just predict TIM barrel and you'd be right.

the question is a separate one - how much of protein universe of folds have already been seen?

So we're in agreement. The expectation is that biochemistry here on Earth only produces a small proportion of the possible structures.
> However, the point the article is making is that for different functions the same basic folds seem to be used again and again.

That's a basic fact in bio. Check the rossman fold page for example: https://en.wikipedia.org/wiki/Rossmann_fold it's a template used for many functions.

Same with the TIM barrel fold. It can catalyze a wide range of reactions.
It seems just obvious that it's at least a sampling problem. Assuming an average protein length of 400 amino acids and 20 possible amino acids, that's about 10^520 different possibilities for sequences, which is a mind-bogglingly large number.

We haven't even begun to explore the biological universe.

Sure - though because of the functional overlap of amino acids already discussed the functional/structural space could be a lot smaller ( though still massive ) - ie is choosing D or E at a particular position "different" in most situations?

And if you take it up a level of abstraction and say there are 4 ( ish ) basic types of secondary structure ( helix, turn, sheet, disordered ). Then you could argue the structural space is even smaller still.

Or put it another way if you can have sequences with 30% identity or lower with the same fold - that's a awful lot of different unique combinations that collapse into a single structural space.

And on the flip side - what we don't know is what percentage of sequence space don't actually result in a functional fold - ie results in instability and multiple stable or unstable conformations.

So it could be we are close to all the possible folds ( where fold is a single stable form - obviously there are quite a lot of disordered states - but I'm not including those in a 'fold' even if evolution uses unstructured states as well) already.