Y
Hacker News
new
|
ask
|
show
|
jobs
by
montebicyclelo
1155 days ago
Huggingface have good guides on tokenization, and tokenizer training. BPE (e.g. used by gpt) and wordpiece (e.g. used by bert) are two commonly used methods
https://huggingface.co/learn/nlp-course/chapter6/5?fw=pt