|
|
|
|
|
by kacperlukawski
763 days ago
|
|
Why is that an issue? Training the tokenizer seems much more straightforward than training the model as it is based on the statistics of the input data. I guess it may take a while for massive datasets, but is calculating the frequencies impossible to be done on a bigger scale? |
|