|
|
|
|
|
by dvt
239 days ago
|
|
This isn't exactly true, as tokens live in the embedding space, which is n-dimensional, like 256 or 512 or whatever (so you might see one word, but it's actually an array of a bunch of numbers). With that said, I think it's pretty intuitive that continuous tokens are more efficient than discrete ones, simply due to the fact that the LLM itself is basically a continuous function (with coefficients/parameters ∈ ℝ). |
|
I would also argue tokens are outside the embedding space, and a large part of the magic of LLMs (and many other neural network types) is the ability to map sequences of rather crude inputs (tokens) into a more meaningful embedding space, and then map from a meaningful embedding space back to tokens we humans understand