| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by trivexwe 765 days ago

Weird article.

It mentions multiple times that ~”the protein folding problem is solved” as well as multiple instances of ~”but there are limitations to this technique and it is often missing crucial details”.

It really is difficult to conceptualize these highly nonlinear problem spaces, like protein folding, until you attempt to work with them.

Many in software development have an intuitive understanding of the difficulty evidenced in the community’s ~“the last 10% took 100% of the time” meme.

Even in a nonlinear problem spaces you have “trivial” solutions.

Terry Tao famously coauthored a paper finding arithmetic progressions for generating sequences of primes.[1] The sequences found are “trivial” in terms of “solving the prime sequence problem” in that they are sparse, the sequences are finite, and progressions lack a method of find more.

These machine learning tools are by design approximation engines. I’m unsure of any results that prove one way or the other that it is possible to pass a bound of approximation that provides exact solutions. (think, an approximate solution that only fails to provide exact solutions for solutions that are trivial using a different method, I think a lot of work I p-adics is motivated similarly)

I feel these machine learning techniques are expanding the definition of “trivial solutions” to include those capable of being solved by their convoluted methods (back prop, etc). Since this new subset of the space that can be labeled “solved” appear more complex than known trivial solutions people assume the whole space must be known, and this is where the difficult conceptualization rears its influence.

Protein folding is still an unsolved problem, and I’m dubious of the notion machine learning will ever solve it, but hopefully we get some helpful science out of it.

[1] https://en.m.wikipedia.org/w/index.php?title=Green%E2%80%93T...

3 comments

eru 765 days ago

> Protein folding is still an unsolved problem, and I’m dubious of the notion machine learning will ever solve it, but hopefully we get some helpful science out of it.

As a working hypothesis, protein folding assumes that a protein folds into the globally lowest energy configuration. And that's a good assumption for a start.

However, nature isn't magic and can't magically solve global optimisation problems. If there's a region in configuration space with a local minimum and high enough energy 'walls', this might be stable enough for the protein to be stable.

For reasons of computational complexity, I agree that machine learning will probably never solve the global minimisation problem. But the complicated and messy local optimisation problem that we see in reality might very well be solvable eventually by something like machine learning.

Why are you dubious? Where do your objections come from?

gabia 764 days ago

Great points about the energy minimisation issue. Funnily enough, this is actually a problem with de-novo protein design at the moment: the designed proteins are _too_ stable! Compared to natural proteins. Protein are often not static shapes, they are machines that need to be dynamic - in other words what you said, they do not live at some deep global optimum.

eru 764 days ago

Interesting point!

> [...] in other words what you said, they do not live at some deep global optimum.

I think what you said only depends on the minimum being relatively flat (instead of deep); but it doesn't matter whether it's global or local.

exmadscientist 764 days ago

> I think what you said only depends on the minimum being relatively flat (instead of deep); but it doesn't matter whether it's global or local

No. There is no such thing as a "global minimum" energy conformation, because the conditions vary wildly. Many protein structure changes are brought about by changes in the local chemical potentials and even electric fields. This is not something you can get a good grip on by thinking in terms of "flat minima".

trivexwe 765 days ago

> Why are you dubious? Where do your objections come from?

That the results the machine learning techniques provide are still nondeterministic.

Meaning that they are, in terms of identifying other local minima that satisfy the constraints, as good as a guess.

If the provided solution also came with a method of systemic modification to derive all other solutions that satisfy the constraints, then I would be satisfied.

Without that you are unable to say with certainty that your local minima is correct even if nature fails to adhere to the lowest energy assumption.

> However, nature isn't magic and can't magically solve global optimisation problems.

I wonder sometimes. Let’s remember, this is an open question after all.

I have a long standing hypothesis that an algorithmic solution to the global optimization problem is what lends action potentials the appearance, or essence?, of what we mean when we speak of “consciousness”.

But I am a more inclined toward the abstract aspects of the mathematics behind the problem, and leave advocacy for the current techniques to researchers developing practical solutions with them.

I applaud the people who toiled with X-ray crystallography to build the field to the point that a machine learning technique could be developed.

eru 764 days ago

> That the results the machine learning techniques provide are still nondeterministic.

I think I know what you are trying to say, but 'determinism' or not isn't the problem. You can run machine learning methods completely deterministically: just use a pseudo-random-number-generator (and be careful about how you seed it, and be wary of the problems with concurrency etc).

> If the provided solution also came with a method of systemic modification to derive all other solutions that satisfy the constraints, then I would be satisfied.

> Without that you are unable to say with certainty that your local minima is correct even if nature fails to adhere to the lowest energy assumption.

Have a look at how integer linear programming solvers work. They use plenty of heuristics and non-determinism for finding the solution, but at the end they can give you a proof that what they found is optimal.

You are right, that you don't get that kind of guarantee with current machine learning approaches. Though you could modify them in that direction. (Eg if you added machine learning to an integer linear programming solver, you would hook it in as a new heuristic, but you would still want the proof at the end.)

> I have a long standing hypothesis that an algorithmic solution to the global optimization problem is what lends action potentials the appearance, or essence?, of what we mean when we speak of “consciousness”.

Sounds like woo. Protein folding in bacteria and yeast work pretty similar to how it works in humans. In fact, we can transfer genes from us to yeasts to produce many of the same proteins human produce. But you'd be hard-pressed to argue that yeast are sentient.

This reminds me of how some people claim that soap films are super special because those films can solve optimisation problems. See eg https://highscalability.com/why-my-soap-film-is-better-than-... If you put soap film between a bunch of supports, even if the supports have complicated shapes, the soap film will tend to minimise its overall surface area.

Of course, if you look deeper into it, and do larger scale experiments, you figure out that the soap only assumes a local minimum.

trivexwe 764 days ago

> Sounds like woo.

O, definitely woo. I tried to make that explicitly clear by using “hypothesis” and “appearance”.

My hypothesis is less “optimization solutions == consciousness” and more positing that our brains, “action potentials” was meant as cheeky shorthand for the human brain, use an “optimization solution” that we identify as “consciousness”, or as you put it “sentience”.

But to quote South Park, “and I base that on absolutely nothing”. ;P

eru 764 days ago

You might like https://scottaaronson.blog/?p=735 for some speculation on those topics with slightly more technical grounding. Direct link: https://www.scottaaronson.com/papers/philos.pdf

Especial the chapter: 'Computational Complexity and the Turing Test'

spywaregorilla 764 days ago

I feel like there should be a much stronger effort to solve optimization problems with ML enabled guesses. It's arguably the most important problem to be solving to improve ML itself.

Humans, for example, can provide extremely strong guesses by just eyeballing travelling salesmen problems without doing any calculations. If we could use ML to take a problem and guess how to reformulate it with 95% of the search space cut out, we would be in a much stronger place. My gut says this should be theoretically possible and is probably the mechanism that under the hood biological learning systems use to such a great effect that its ok to just use greedy and less efficient methods to do last mile of optimization without something like backprop.

eru 761 days ago

Human can mostly only do these kinds of guesses for traveling salesmen problems embedded in 2d Euclidean space. But we have pretty good heuristics for these cases to kickstart a solver, too. Give a human a general graph with arbitrary edge weights, and they'll be dumbfounded.

(I don't think you even have to go all the way to an arbitrary graph, I suspect a decent sized graph with edge lengths embedded in 3d euclidean space will already confuse humans. Definitely once you get to 4d.)

spywaregorilla 759 days ago

My point is not that we should mimic humans. My point is that there's probably learnable but inexplicable heuristics you could learn for generally solving gradient descent problems just by the formulations on their own that a neural net would be good at.

ngrilly 764 days ago

> As a working hypothesis, protein folding assumes that a protein folds into the globally lowest energy configuration. [...] If there's a region in configuration space with a local minimum and high enough energy 'walls', this might be stable enough for the protein to be stable.

Sounds like gradient descent :)

eru 764 days ago

Well, or hill climbing in general.

dekhn 764 days ago

The working hypothesis you described was considered fairly obsolete some time ago. The current model is much more "most proteins fold to kinetically accessible states". The assumption of global lowest energy led to a lot of wasted effort and misled computer scientists. But along the way to this understanding we learned an awful lot about the forces that affect folding- see for example "hydrophobic collapse".

eru 761 days ago

Thanks!

nybsjytm 765 days ago

I'm a mathematician but tbh I have no clue what you mean by saying that arithmetic progressions of primes are "trivial" or analogous to anything here or in machine learning.

trivexwe 765 days ago

Yeah, the messaging got a little muddled, but the relation was purely analogical.

I was trying to point to a situation where you have a clear problem: a generating function for the prime number sequence; and a solution that identifies a small subset of the intended sequence without addressing, or even informing in any substantial way, the full breadth of the original problem.

> At the time of writing the longest known arithmetic progression of primes is of length 23, and was found in 2004 by Markus Frind, Paul Underwood, and Paul Jobling: 56211383760397 + 44546738095860 · k; k = 0, 1, . . ., 22.'.

The triviality was overloaded to both imply that calculating this subset is trivial, it is a simple arithmetic progression, and that subset of the full prime number sequence is now trivial to produce.

In the same way that the Green-Tao theorem has yet to lead to a complete solution to the prime number sequence, I feel, the machine learning techniques will fail to lead to a complete solution to protein folding.

nybsjytm 764 days ago

It would be very hard to make a good analogy with this since the problem of "finding" arithmetic progressions is, as far as I know, of negligible interest compared to the structural knowledge of their existence. The situation is perfectly reversed in both computational biology and machine learning. But maybe I misunderstand what you mean by "a complete solution to the prime number sequence."

dekhn 764 days ago

This is a bit pedantic, but: AF says little to nothing about protein "folding". It is focused on static structure prediction. The history of this is a bit muddy but if you follow the details carefully you'll see that "protein folding" is a term that references the physical process by which proteins adopt their "final" conformations (or more accurately, interconvert between a bunch of accessible conformational states), while static structure prediction only cares about the final conformational state (possibly states).

Although many people say "protein folding problem" that's really referring to a different and far more complex problem than static structure prediction. What is the exact trajectory that a protein follows when moving from the fully unfolded state to the final states? What forces dominate that process? How do proteins overcome large barriers so quickly? To what extent does the cost of interacting with water dominate? What are the rates at which fully folded proteins interconvert between substates? Which proteins will never fold on their own, why, and how do they get folded by other proteins?