|
|
|
|
|
by akoboldfrying
157 days ago
|
|
Cool! It creates very plausible encodings. > The Llama tokenizer used in this project sometimes permits multiple possible tokenizations for a given string. Not having tokens be a prefix code is thoroughly unfortunate. Do the Llama team consider it a bug? I don't see how to rectify the situation without a full retrain, sadly. |
|