|
|
|
|
|
by sundarurfriend
1046 days ago
|
|
Another learner here, one clarification that I think is useful even for beginners: > A token is a unique integer identifier for a piece of text. A token is a word fragment that's common enough to be useful on its own - for eg., "writing", "written", "writer" all have "writ", so "writ" would be an individual token, and "writer" might be tokenized as "writ" and "er". An embedding is where the tokens get turned into unique numeric identifiers. |
|
character sequence (string) -> token (small integer) -> embedding (vector of floats)