| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by _akhe 795 days ago
	> We now have a method of embedding a variable length piece of text into a fixed size vector Question: Is it a rule that the embedding vector must be higher dimensional than the source text? Ideally 1 token -> a 1000+ length vector? The reason I ask is because it seems like it would lose value as a mechanism if I sent in a 1000 character long string and only got say a 4-length vector embedding for it. Because only 4 metrics/features can't possibly describe such a complex statement, I thought it was necessary that the dimensionality of the embedding be higher than the source?

1 comments

p1esk 794 days ago

No. Number of characters in a word has nothing to do with dimensionality of that word’s embedding.

GPT4 should be able to explain why.