|
|
|
|
|
by gsuuon
1046 days ago
|
|
There's also just far more tokens to train on if you do multi-language. I'd guess only the most popular languages would even have enough training data to get a specialized version - but it would still be an interesting trade off for certain use cases. Being able to run a local code assistant on a typescript-only project for example, with a 32k context window would really come in handy for a lot of people. I don't know enough to understand the impact of vocab size vs context size. |
|
The vocab size of llama2 is 32,000. I guess I personally don't think that there's enough difference in programming languages to actually save any meaningful number of tokens considering the magnitude of the current vocab.