Hacker News new | ask | show | jobs
by cgiles 2202 days ago
I work in molecular biology research, and I think this is a great article that strikes at the heart of many problems in the field. I can't comment on the climate change stuff, although I wish he hadn't included it because it was almost certain to distract people from the overall point.

The problem is that there are no remotely comprehensive, predictive, and mathematical models of what goes on inside of cells. It is pure empiricism: you run an intervention, and see what happens. Write it up in a paper.

All well and good, except there are no viable models of what is happening inside that are predictive in the sense of being able to know what an intervention will do until you test it. We really need that if we want to develop treatments for molecular diseases that are more than marginally better.

The Santa Fe Institute, systems biology people, and others were working hard on this problem at the turn of the century, but progress has stalled. It's too hard. We don't know how to do it. A new "mathematical epistemology" that could handle this problem would be a huge step forward, if it is possible.

I can see why the author would extend this idea to things like economics or climate science. The thought in systems research was that, perhaps, different fields share similar underlying "complex systems" mechanisms, and if we can solve the problem in one area, we may have insights for how to do it elsewhere.

8 comments

Thanks for this generous comment. The author of TFA has articulated a genuine problem that is central to many large-scale investigations these days, across many domains. We rely a lot on complex computer simulations, or complex physics-based models, that have a lot of fiddly details that are understood by only a limited set of people.

Yet, we want to learn from these models, and we want to reach conclusions from them. This has turned into a key problem for the scientific enterprise.

There are so many linked issues, some technical, some philosophical: Mere Monte Carlo state exploration is wasteful and doesn't provide much insight. Often we don't have error bars on model outputs to even know if an "improvement" in a metric is significant. There can be unknown unknowns that keep us from trusting our models completely.

It's a very rich and challenging problem space.

In my understanding, the Dept. of Energy was the first community to engage with these problems due to the test ban treaty. They had the mandate to ensure the nuclear stockpile works, despite not being able to fully test it. So they need models and they need to know how far to trust them.

One landmark reference for that is the NAS report on uncertainty quantification and complex models: https://www.nap.edu/catalog/13395/assessing-the-reliability-...

> Mere Monte Carlo state exploration is wasteful and doesn't provide much insight. Often we don't have error bars on model outputs to even know if an "improvement" in a metric is significant.

The funny thing is, I didn't check the author's name until just now. Ed Dougherty, who people below have derided as a "mere engineer", has been working on these problems forever. I'm honestly surprised he's still active or even alive: he was a graybeard when I heard his talk a decade ago. He is a bona fide systems biologist, one of the oldest ones.

At that time, his group was doing gene regulatory network inference on gene expression with ~600 genes. They were using the kind of approach (MC) you mention to infer a small subset of the overall network.

The main thing I took away from their results (at the time) is you can get multiple drastically different network topologies all with similar metrics on the objective function. This implies GRN inference was not inferring some kind of underlying reality. It also suggests you cannot accurately infer subnetworks, which in turn suggests cellular networks aren't all that modular.

Therefore, really a distinction should be drawn between models that are simply predictive and those that also model the underlying reality, which is even harder.

> We rely a lot on complex computer simulations, or complex physics-based models...we want to learn from these models, and we want to reach conclusions from them.

Not in molecular biology. There genuinely are no models like that except in very limited subfields like protein folding, and 99% of biologists would see them as mathematical mumbo-jumbo.

I see from your bio you're also in engineering research. You would not believe it if I told you how mathematically illiterate the average PhD biologist is. My PhD alma mater added a statistics course for the first time last year, a 2 week summer course. Calculus I is "recommended" for admission. This is not unusual.

It isn't seen as needed, because state of the art research is basically all qualitative, with a quantitative veneer of t-tests overlaid on top. So I'm glad to hear other fields at least recognize the problem. Biology hasn't even got that far.

I also didn't care for the coarse characterizations nearby.

I take your point about the distinction between models that reproduce behavior ("simply predictive") vs. models of underlying components, and what you can learn from both.

This comes up in fields I work on with machine learning models vs. physics-based models. E.g., ML models that take a field of wind vectors at time t, and predict the wind at time t+1, vs. physical models that implement the flow equations. You can fit parameters of both flavors of models to match observations, but we certainly have more confidence in the robustness of the physics-based models.

About mathematically-challenged biologists - here's a hypothesis. I'll bet that if you started scanning conference abstracts in your domain for "uncertainty quantification," then some more carefully-posed modeling activities would crop up. (As you suggest, probably in the domains where more quantitative work is done.)

> we certainly have more confidence in the robustness of the physics-based models.

That is interesting. I don't know to what extent wind vectors are considered chaotic in the technical sense, but I would have guessed that chaotic systems would be more robustly modeled by ML instead of a physics approach. This is because I have a vague idea in my mind that ML would somehow compensate for the initial condition dependence in a way physics modeling would not. ML models tend to also have more parameters with smaller coefficients which I would identify with robustness (up to a point). I'm not gainsaying you, just expressing that I find this counterintuitive.

Of course the physics models would provide more insight into the nature of the problem.

And more generally it is my understanding that one way to define the difference between a "complex system" and a "system" is that a complex system is not predictable by physics simulations because of emergent properties and so forth.

For this reason, I interpreted OP's call for a "mathematical epistemology" not so much as a call for more physics-based modeling, or for opaque ML models, but as an expression of the need for a (currently undefined) new type of mathematical language to model, describe, and predict complex emergent systems.

> I'll bet that if you started scanning conference abstracts in your domain for "uncertainty quantification," then some more carefully-posed modeling activities would crop up.

I'm sure you're right. I let my wistful longing that there would be more of this type of thinking in biology drag me into hyperbole suggesting that there is none of it.

I appreciate the pointers to terms and books that could get me up to speed on modeling. It's not really relevant to my primary area, but I do wish these approaches well from afar. And who knows, if I learn more, maybe I can apply more of this type of approach in my work. Getting audiences to understand it would be another task entirely...

Thanks for the info on the author, he has great articles!

https://asiatimes.com/2018/12/the-american-crisis-in-science...

He is a canary in the coal mine that our society ignores, kind of like how we ignored the warnings about a corona virus over the last decade.

Sigh

Some mathematicians have come together to investigate systematically the science of composing and decomposing systems made of systems. There is a dire need for breakthroughs in this area all across society. https://www.azimuthproject.org/azimuth/show/HomePage
I am surprised that people in biology are trying to build predictive theories.

Even in physics the moment one needs to deal with a number of strongly interactive components, we cannot calculate from the first principles. We still have no theory for high-temperature superconductors. We even cannot calculate properties of metallic hydrogen. And this is the simplest material that can exist, just a soup of electrons and protons.

Hasn't it always been like that? It took like fifteen years of sustained investigation just to identify DNA as the substrate of genetic transmission in bacteria. We've come a long way since then, but isn't it a bit much to expect that we'd already have effective, automated, catholic simulations of cellular biology?
A cell is a regulator, something that selectively responds to internal and external signals. It doesn't require new math, it requires modeling the signals and responses accurately. But it is a huge, parallel, continuous "state machine".

The problem with an empirical approach is that, when in one state it might respond to an intervention differently than when in another state. Especially if the interventions are at the same level as the signals it normally responds to. (Eg not drowning it in a chemical that inhibits a certain reaction, that will likely always yield the same outcome. But a regulatory hormone might not always yield the same outcome.)

What's missing from current mathematics to make predictive models for biology?

I did a search for "neural network cell simulation" and got a few hits, e.g. https://ieeexplore.ieee.org/document/8805421.

So it seems that people are working on the problem of predictability (or at least augmenting the researcher's/experimenter's ability to do some analysis ahead of time based on simplified models).

> What's missing from current mathematics to make predictive models for biology?

Well, I think that, no joke, there is a Nobel Prize waiting for anyone who knows the answer to that. I think this is the next big paradigm shift needed in biology, not to mention several other fields.

Who is to say that the problem is strictly mathematical, though? It could be that the math exists, but no one knows how to fit existing data into it, or it could be that there is not enough data, or the right kind of data, to make such a model yet. It could be that both the data and algorithm exists, but we need to turn the Earth into computronium to run it. Who knows?

> So it seems that people are working on the problem of predictability

I'm sure they are. They have been for decades. The last time I did a systematic review of this area was before the resurgence of neural networks, so I can't really say what is the latest progress, or whether the progress in ANNs can inform this problem. I suspect it's very possible.

The situation right now, as far as I know is that: A) most biologists don't even know this is a problem, and B) those who do, don't have any idea what the solution is, or if one even exists (note the author of the linked article was pessimistic on that point).

Flops.

Cells balance right on the edge of Maxwell's Demon. Even a few thousand ions can change behavior radically. So, you are forced to track all the ions, proteins, lipids, etc. Which means you have to do a lot of atom-by-atom tracking. There are a few tricks here, but since the cell is not crystalline, you can't do a lot of fun physicsy math to get the problem to be easier.

Also, most of the time, since this is 'research' to begin with, you don't know what's in the cell. That's the point of looking. We've nearly no idea what all the proteins are in any given cell. DNA gives some guide, but a stochastic switch from coding to non-coding happens, constantly. So you don't know what all the proteins in a cell are, where they are, what they do, what they don't do, what the extracellular space is like, etc.

Cells are just really complicated. So you need a lot of flops.

How is "edge of Maxwell's Demon" related to "edge of chaos"?

Re: flops. I understand brute force is a good way to simulate dynamics but we constantly solve hard problems by approximation and have gotten pretty far with that approach. So what approximations have been tried and why have they been considered failures?

Also https://mobile.twitter.com/SteveStuWill/status/1268111230020...: > "Scientists created fully functional mini-livers out of human skin cells, then successfully transplanted them into rats. The research is a proof-of-concept for potentially revolutionary technology and provides a glimpse of an organ donor-free future." Wow!

That's unrelated to the original points but I see plenty of innovative approaches to problems in biology. Simulating cells is just one way to figure them out and we don't need to figure them out completely through computational means to put them to good uses. Biology is already computronium and if we can understand how to "program" then we don't need to simulate everything.

Thanks for sharing your insights and experience. Are there predictive models elsewhere in your field, or in science, that inspire this search? Also I'm wondering if there are predictive models for cellular sub-systems, for example--the simpler stuff (afaik).
Is it nobody knows how to make the jump from basic physics to predicting cell behaviour (sorta like quantum/classical), or would predicting the behaviour just require too much compute power?
That's what everyone tries to do but it's fatally flawed, especially since the practitioners are often clueless about some of the soft emergent behaviours in between (like chemistry). As Sydney Brenner quipped, modern systems biology is low input, high throughput, no output science., Or as a cs person might say, GIGO principle in action.