|
|
|
|
|
by nighthawk454
949 days ago
|
|
I think the analogy is something like: if you have a simple distribution over all words, then that's just word frequency. Obviously not a good predictor. The 'information' necessary to predict the correct next word contextually is just not there if you're predicting words in a vacuum. In order to be practically useful and predict the right words _in context_, the model must be conditioning off of more of the sentence/document (aka more information). So it should not be surprising that a 'glorified autocomplete' has some degree of "understanding" as it would be impossible for it to be any good as an autocomplete-er otherwise. |
|