| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hn_throwaway_99 748 days ago

Agreed, would be interested if someone with more knowledge could comment.

My layman's understanding of LLMs is that they are essentially "fancy autocomplete". That is, you take a whole corpus of text, then train the model to determine the statistical relationships between those words (more accurately, tokens), so that given a list of tokens of length N, the LLM will find the next most likely token for N + 1, and then to generate whole sentences/paragraphs, you just recursively repeat this process.

I certainly understand encoding proteins as just a linear sequence of tokens representing their amino acids, but how does that then map to a human-language description of the function of those proteins?

1 comments

changoplatanero 748 days ago

Most protein language models are not able to understand human-language descriptions of proteins. Mostly they just predict the next amino acid in a sequence and sometimes they can understand certain structured metadata tags.

link

_heimdall 747 days ago

Can they understand the functional impact of different protein chains, or are they just predicting what amino acid would come next based on the training set with no concern for how the protein would function?

link