|
|
|
|
|
by gwern
1500 days ago
|
|
> The alternatives are learning at the character level (way more complex No, BPEs are more complex: you have a whole additional layer of preprocessing, with all sorts of strange and counterintuitive downstream effects and brand new ways to screw up (fun quiz question: everyone knows that BPEs use '<|endoftext|>' tokens to denote document breaks; what does the string '<|endoftext|>' encode to?). BPEs are reliably one of the ways that OA API users screw up, especially when trying to work with longer completions or context windows. But a character is a character. > and scales badly in memory/compute) Actually very competitive: https://arxiv.org/abs/2105.13626#google (Especially if you account for all the time and effort and subtle bugs caused by BPEs.) |
|