Hacker News new | ask | show | jobs
by Vetch 1442 days ago
Hi, I'm looking but can't seem to find instructions on how to do tokenization. Where is spm model, is it "flores200_sacrebleu_tokenizer_spm.model" or something else? And is it direct or spm -> dict? Or how to prime model for a specific language pair?
1 comments

We tokenize with the flores-200 spm model, correct. To generate from the model, check out the instructions here: https://github.com/facebookresearch/fairseq/tree/nllb/exampl...