Hacker News new | ask | show | jobs
by redelbee 2132 days ago
At what point do we shift our investment in time and energy from building models like those mentioned in the article to the bigger picture? Maybe it’s just my perception but it doesn’t seem like we have very many people thinking deeply about what models we should build and to what ends. Instead we are just building the models and hoping we can put them to good use afterwards.

For example, what’s the end game for the cellular signaling modeling outlined in the article? It seems like the result isn’t valuable in and of itself, and it can’t be much more than that because the scientist “doesn’t understand it, and doesn’t think any person could.” So we now have an equation that expresses constants within a cell and that’s it. We don’t understand it and we can’t put it to good use. So was that time and effort well spent? Do we just put this work in a drawer so we can pull it out if it could be useful at some point in the future? Is that what we’re doing with all the similar advances in modeling?

There’s nothing wrong with knowledge for knowledge’s sake, but I think we’ve way over indexed on the tools and predictions side of the system. If we continue to constantly create new tools/models/predictions we might find a use for them by chance. It just seems more efficient to focus on what outcomes we really want and then put the models to work in pursuit of those outcomes. Perhaps we focused more on the outcomes in the past because we didn’t have the technological horsepower to constantly churn out new models.

Maybe I’m wrong and there are people working on the big picture. Are there modern day philosophers doing this work? Do they make up a significant portion of the work being done? If not, why?

4 comments

> what’s the end game for the cellular signaling modeling outlined in the article?

Pharma.

Most of the modeling work that people do is fairly well motivated. Going from models to working technology is indeed a huge leap, but everything starts with the basic scientific understanding.

> Maybe I’m wrong and there are people working on the big picture.

You can usually find the "big picture" behind a paper by reading the recent grant applications from the PI who funded the research (or the funding lines explicitly mentioned in the paper, if any).

Pharma may be the intended target for the signaling work, but as a data scientist who works in pharma, I can say with certainty that no biologist or chemist here would entertain for a minute any model that can't explain its mechanisms of action. Nor would the FDA, who wants any model not only to accurately predict the intended outcome but also reflect awareness of the contextual circumstances that surround and lead to it.

No competent physician would be satisfied with a disembodied diagnosis. The constituent symptoms and assay metrics that support that diagnosis are essential to know, especially as disease is often complex and dynamic, and no single diagnostic label should ever hope to supplant a deeper understanding of each patient's unique mix of normality and abnormality. A diagnosis using ML may be a useful starting point in treatment, but never should be the endpoint.

>no biologist or chemist here would entertain for a minute any model that can't explain its mechanisms of action.

Entertain? Who even really knows what means at this point. But I'm fairly convinced that you'd be quite happy to have a theory-free "intuition pump" that could tell you "if you slow down binding with the following 3 membrane proteins, you see roughly double that effect on overall energy use by the cell".

The tool that generates this prediction may be completely unable to give you a "theory" about why this should be so, but then neither will the experiment(s) you do that confirm it to be true.

So, while indeed, ML-style stuff "should never be the endpoint", they can act as a incredibly useful intuition pump/launchpad for ideas and approaches that would otherwise remain inaccessible.

That's the mode of use for ML in most industries -- flagging stuff for follow-up by humans. Basically anything that's not real-time works like this.

Most uses of ML in real-time settings look more like hybrid systems -- a little dusting of ML on top of a whole heap of more traditional mathematical modeling/software engineering.

Outside of a few very niche settings, we're still a long way off from "trusting" ML in any meaningful sense.

> For example, what’s the end game for the cellular signaling modeling outlined in the article?

I think the article meant to refer to systems biology (as in a new field). It's not exactly a single 'model' rather a methodology as far as I know. Also the 'end game' IMO in bioinformatics the goal is mostly to discover new knowledge rather than having a 'production ready' model. Through the 'big data' science one could uncover hidden biological effects, new mechanism and new insights etc. Each of these big data modelling exercise is really to push the biology forward to a deeper level. In a way it can be comparable to astrophysics (is it a coincidence that many people working in bioinformatics have an astrophysics background)?

> It just seems more efficient to focus on what outcomes we really want and then put the models to work in pursuit of those outcomes.

So which outcomes do we want? - and who is "we" anyway?

Figuring that out may be a hard problem by itself.

I was thinking of the human “we.” I think that’s my point: It’s hard work whether you work on models or focus on the bigger picture of what outcomes would be best for humanity. I think it makes more sense to work hard on the latter, or at least to work hard on it first and then build the models.
Going to plug a couple of relevant things here.

- A book I saw recommended here called "The Sciences of the Artificial," which talks about the purpose and practice of modeling with computers.

- An old post of mine, where I wrote that "creating knowledge is a philosophical act that businesses mostly didn't realize they were getting into when they got on the data science bandwagon."

- A post by HN user "wenc," a practicing data scientist. I'm going to copy-paste the whole thing because I think it's that good and relevant:

---

Data science is correctly valued when you realize how relatively unimportant it is. It is a small cog in a larger machinery (or at least it ought to be). You see, decision-making involves (1) getting data, (2) summarizing and predicting, and (3) taking action. Continuous decision-making -- the kind that leads to impact -- involves doing this repeatedly in a principled fashion, which means creating a system around the decision process. For systems thinkers, this is analogous to a feedback control loop which includes sensor measurements + filters, controllers and actuators. (1) involves programmers/data engineers who have to create/manage/monitor data pipelines (that often break). This the sensor + filters part, which is ~40% of the system. (2) involves data scientists creating a model that guides the decision-making process. This is the model of the controller (not even the controller itself!), which is ~20% of the system. Having the right model is great, but as most control engineers will tell you, even having the wrong model is not as terrible as most people think because the feedback loop is self-correcting. A good-enough model is all you need. (3) involves business/front-line peoplewho actually implement decisions in real-life. This is where impact is delivered. ~40% of the system. This is the controller + actuator part, which makes the decisions and carries them out. Most data scientists think their value is in creating the most accurate model possible in Jupyter. This is nice, but in real-life not really that critical because the feedback-loop inherently moderates the error when deployed in a complex, stochastic environment. The right level of optimization would be to optimize the entire decision-making control feedback loop instead of just the small part that is "data science". p.s. data scientists who have particularly low-impact are those who focus on producing once-off reports (like consultant reports). Reports are rarely read, and often forgotten. Real impact comes from continuous decision-making and implementing actions with feedback. Source: practicing data scientist