Hacker News new | ask | show | jobs
by Ukv 611 days ago
> When we ask an LLM a question, something fascinating happens. It scours through its vast training data [...]

At inference time the model has no direct access to its training data (excluding RAG). Could still argue that the model's weights effectively encode probabilities to a similar end effect as if it were doing this, just that I'd be wary of taking it too literally.

> the most probable sentence [...] it's essentially about frequency - how many times something appears in the training data.

I think this frequency view becomes a bit nebulous when the LLM is generating continuations of text that has never occured before (which will be most cases, if including a system prompt).

Even for text that has occured in its training data, there's no guarentee that a model which has seen "My name is Xavier and I play the tambourine" would complete "My name is Xavier and I play the" in the same way. It may well choose "xylophone", despite it never occuring in that sentence, due to associating xylophone with instruments in its semantic space and having alliteration with Xavier.

At the very least, the "most probable" has to be worked out with respect to a large number of non-trivial rules, not just how frequently the phrase appears in training data.