|
|
|
|
|
by colobas
1068 days ago
|
|
The model learns an embedding table, where (roughly) each row is used as the model’s internal representation of each token. The numbers in that table are learned. What isn’t learned is the map from token (i.e. combination of characters/byte-pairs) to row-index in the embedding table. That is given by tokenization EDIT: removed redundant bit |
|