| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by colobas 1068 days ago
	The model learns an embedding table, where (roughly) each row is used as the model’s internal representation of each token. The numbers in that table are learned. What isn’t learned is the map from token (i.e. combination of characters/byte-pairs) to row-index in the embedding table. That is given by tokenization EDIT: removed redundant bit