|
|
|
|
|
by DoctorOetker
305 days ago
|
|
During the prompt embedding optimization, the embeddings are allowed to take on any vector in embedding space, instead one could use a continuous penalty for superposing tokens: Consider one of the embedding vectors in the input tensor: nothing guarantees its exactly on, or close to a specific token. Hence the probabilities with respect to each token form a distribution, ideally that distribution should be one-hot (lowest entropy) and worst case all equal probability (highest entropy), so just add a loss term penalizing the entropy on the quasitokens, to promote them to take on actual token values. |
|