|
|
|
|
|
by ben_w
1191 days ago
|
|
My wild guess is that if it could get things done by tokenising like that all the time, they wouldn't need to also have word-like tokens. If that is a inference time performance or training time performance or a model size issue or just total nonsense, I wouldn't know. |
|