|
|
|
|
|
by sailingparrot
702 days ago
|
|
> Can you change the tokenizer? Yes. You can change it however you like, then look at the paper [1] under section 3.2. to know which hyperparameters were used during training and finetune the model to work with your new tokenizer using e.g. FineWeb [2] dataset. You'll need to do only a fraction of the training you would have needed to do if you were to start a training from scratch for your tokenizer of choice. The weights released by Meta give you a massive head start and cost saving. The fact that it's not trivial to do and out of reach of most consumer is not a matter of openness. That's just how ML is today. [1]: https://scontent-sjc3-1.xx.fbcdn.net/v/t39.2365-6/452387774_... [2]: https://huggingface.co/datasets/HuggingFaceFW/fineweb |
|