| HN Mirror

global_step = 1377; phase = continuous; lr = 5.00e-03; average_loss = 0.609497 current tokens: ' Superman' '$MESS' '.");' '(sentence' '");' '.titleLabel' ' Republican' '?-' global_step = 1956; phase = continuous; lr = 5.00e-03; average_loss = 0.589661 current tokens: ' Superman' 'marginLeft' 'iers' '.sensor' '";' '_one' '677' '».' global_step = 2468; phase = continuous; lr = 5.00e-03; average_loss = 0.027065 current tokens: ' cited' '*>(' ' narrative' '_toggle' 'founder' '(V' '(len' ' pione' global_step = 4871; phase = continuous; lr = 5.00e-03; average_loss = 0.022909 current tokens: ' bgcolor' '*>(' ' nomin' 'ust' ' She' 'NW' '(len' ' pione'

During the prompt embedding optimization, the embeddings are allowed to take on any vector in embedding space, instead one could use a continuous penalty for superposing tokens:

Consider one of the embedding vectors in the input tensor: nothing guarantees its exactly on, or close to a specific token. Hence the probabilities with respect to each token form a distribution, ideally that distribution should be one-hot (lowest entropy) and worst case all equal probability (highest entropy), so just add a loss term penalizing the entropy on the quasitokens, to promote them to take on actual token values.