|
|
|
|
|
by hn_throwaway_99
700 days ago
|
|
Agreed, would be interested if someone with more knowledge could comment. My layman's understanding of LLMs is that they are essentially "fancy autocomplete". That is, you take a whole corpus of text, then train the model to determine the statistical relationships between those words (more accurately, tokens), so that given a list of tokens of length N, the LLM will find the next most likely token for N + 1, and then to generate whole sentences/paragraphs, you just recursively repeat this process. I certainly understand encoding proteins as just a linear sequence of tokens representing their amino acids, but how does that then map to a human-language description of the function of those proteins? |
|