Hacker News new | ask | show | jobs
by PaulHoule 1082 days ago
It “understands” the prompt by passing the data through the neural network and activating individual neurons to a greater or lesser extent.

In the case of BERT models (which I know better), there is an an activation for each token and that activation captures the meaning of the token in context. You can average these over all the tokens in a document and get a vector which is similar to the document vectors used in information retrieval. Traditionally you would count how many times each word is in a document and make a vector indexed by words, but the BERT vector can (1) find synonyms since these typically have a vector close to words with similar meanings and (2) differentiate different meanings of a word because the neuron activation is affected by the other words around it.

Activation of the neural network is the way that it represents the input text and I think “representation” is what is going on when it “understands insofar as it does.