|
|
|
|
|
by sp332
1146 days ago
|
|
I would actually be less worried about a sequence of raw bytes than the tokens generated by BPE. If "Japan" is 01 and "language" is 02, then "Japanese" will probably be 03, which has no connection at all to 01 or 02. But raw, verbose encoding slows down convergence at the beginning. (Well, at least in English it does.) |
|