Hacker News new | ask | show | jobs
by stanford_labrat 484 days ago
So I'm a biomedical scientist (in training I suppose...I'm in my 3rd year of a Genetics PhD) and I have seen this trend a couple of times now where AI developers tout that AI will accelerate biomedical discovery through a very specific argument that AI will be smarter and generate better hypotheses than humans.

For example in this Google essay they make the claim that CRISPR was a transdisciplinary endeavor, "which combined expertise ranging from microbiology to genetics to molecular biology" and this is the basis of their argument that an AI co-scientist will be better able to integrate multiple fields at once to generate novel and better hypothesis. For one, what they fail to understand as computer scientists (I suspect due to not being intimately familiar with biomedical research) is that microbio/genetics/mol bio are closer linked than you may expect as a lay person. There is no large leap between microbiology and genetics that would slow down someone like Doudna or even myself - I use techniques from multiple domains in my daily work. These all fall under the general broad domain of what I'll call "cellular/micro biology". As another example, Dario Amodei from Claude also wrote something similar in his essay Machines of Loving Grace that the limiting factor in biomedical is a lack of "talented, creative researchers" for which AI could fill the gap[1].

The problem with both of these ideas is that they misunderstand the rate-limiting factor in biomedical research. Which to them is a lack of good ideas. And this is very much not the case. Biologists have tons of good ideas. The rate limiting step is testing all these good ideas with sufficient rigor to either continue exploring that particular hypothesis or whether to abandon the project for something else. From my own work, the hypothesis driving my thesis I came up with over the course of a month or two. The actual amount of work prescribed by my thesis committee to fully explore whether or not it was correct? 3 years or so worth of work. Good ideas are cheap in this field.

Overall I think these views stem from field specific nuances that don't necessarily translate. I'm not a computer scientist, but I imagine that in computer science the rate limiting factor is not actually testing out hypothesis but generating good ones. It's not like the code you write will take multiple months to run before you get an answer to your question (maybe it will? I'm not educated enough about this to make a hard claim. In biology, it is very common for one experiment to take multiple months before you know the answer to your question or even if the experiment failed and you have to do it again). But happy to hear from a CS PhD or researcher about this.

All this being said I am a big fan of AI. I try and use ChatGPT all the time, I ask it research questions, ask it to search the literature and summarize findings, etc. I even used it literally yesterday to make a deep dive into a somewhat unfamiliar branch of developmental biology more easy (and I was very satisfied with the result). But for scientific design, hypothesis generation? At the moment, useless. AI and other LLMs at this point are a very powerful version of google and code writer. And it's not even correct 30% of the time to boot so you have to be extremely careful when using it. I do think that wasting less time exploring hypotheses that are incorrect or bad is a good thing. But the problem here is that we can pretty easily identify good and bad hypotheses already. We don't need AI for that, what takes time is the actual amount of testing of these hypotheses that slows down research. Oh and politics, which I doubt AI can magic away for us.

[1] https://darioamodei.com/machines-of-loving-grace#1-biology-a...

1 comments

It's pretty painful watching CS try to turn biology into an engineering problem.

It's generally very easy to marginally move the needle in drug discovery. It's very hard to move the needle enough to justify the cost.

What is challenging is culling ideas, and having enough SNR in your readouts to really trust them.

> It's generally very easy to marginally move the needle in drug discovery. It's very hard to move the needle enough to justify the cost.

Maybe this kind of AI-based exploration would lower the costs. The more something is automated, the cheaper it should be to test many concepts in parallel.

A med chemist can sit down with a known drug, and generate 50 analogs in LiveDesign in an afternoon. One of those analogs may have less CYP inhibition, or better blood brain barrier penetration, or slightly higher potency or something. Or maybe they use an enumeration method and generate 50k analogs in one afternoon.

But no one is going to bring it to market because it costs millions and millions to synthesize, get through PK, ADMET, mouse, rat and dog tox, clinicals, etc. And the FDA won't approve marginal drugs, they need to be significantly better than the SoC (with some exceptions).

Point is, coming up with new ideas is cheap, easy, and doesn't need help. Synthesizing and testing is expensive and difficult.

But doesn't that mean that ranking the ideas to find the ones most worth testing is a useful problem to solve?
The one model that would actually make a huge difference in pharma velocity is one that takes a target (protein that causes disease or whatever), a drug molecule (the putative treatment for the disease), and outputs the probability the drug will be approved by the FDA, how much it will cost to get approved, and the revenue for the next ten years.

If you could run that on a few thousand targets and a few million molecules in a month, you'd be able to make a compelling argument to the committee that approves molecules to go into development (probability of approval * revenue >> cost of approval)

If you had a crystal ball that could predict the properties of the molecule, perhaps.