| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dekhn 786 days ago

I don't have a direct answer to your question. My guess is that LLMs are too limited to make truly great solutions in biology but sequential modelling is a key component that will not be replaced any time soon. For example, transformers were key to AlphaFold's success, but they still needed many other steps to make accurate predictions.

I worked on a predecessor to LLMs - HMMs for protein modelling. They were, and still are for most people the best way to model protein sequences. It's usually done as prediction, rather than generation (IE, you use the model to classify an unknown sequence into a known category, rather than asking the model to generate new instances of a category). HMMs for proteins are a bit stuffy, and they model local changes well, but struggle with long-range interactions that LLMs seem to excel at (for example, an HMM will do a good job of letting you stuff a few more residues into a protein in a localized region such as a hinge, but are not so great at modelling groups of residues that are located far-apart in sequence space but close in protein space).

One detail of the bitter lesson is, imho, that statistical parrots are better than they "should" be, probably for the same reason that mathematics is unexpectedly proficient in modelling physics: to some degree, the models recapitulate the true latent space of the underlying system well enough to generalize outside the original observation space.