| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by DoctorOetker 305 days ago
	During the prompt embedding optimization, the embeddings are allowed to take on any vector in embedding space, instead one could use a continuous penalty for superposing tokens: Consider one of the embedding vectors in the input tensor: nothing guarantees its exactly on, or close to a specific token. Hence the probabilities with respect to each token form a distribution, ideally that distribution should be one-hot (lowest entropy) and worst case all equal probability (highest entropy), so just add a loss term penalizing the entropy on the quasitokens, to promote them to take on actual token values.