|
|
|
|
|
by _akhe
795 days ago
|
|
> We now have a method of embedding a variable length piece of text into a fixed size vector Question: Is it a rule that the embedding vector must be higher dimensional than the source text? Ideally 1 token -> a 1000+ length vector? The reason I ask is because it seems like it would lose value as a mechanism if I sent in a 1000 character long string and only got say a 4-length vector embedding for it. Because only 4 metrics/features can't possibly describe such a complex statement, I thought it was necessary that the dimensionality of the embedding be higher than the source? |
|
GPT4 should be able to explain why.