|
|
|
|
|
by tlrobinson
1165 days ago
|
|
I think yes, but more precisely the tokens were chosen to optimize training on a dataset that's biased to English content. I am curious how the token set affects quality of responses, ignoring the factors related to token count mentioned in the post (cost, prompt expressivity, latency, etc) Is it always better for the token set to be "native" to the majority of the training dataset and prompts/completions, or is it possible there's some "intermediate representation" (in compiler terms) that would be better? |
|