Hacker News new | ask | show | jobs
by colobas 1068 days ago
The model learns an embedding table, where (roughly) each row is used as the model’s internal representation of each token. The numbers in that table are learned. What isn’t learned is the map from token (i.e. combination of characters/byte-pairs) to row-index in the embedding table. That is given by tokenization

EDIT: removed redundant bit