Hacker News new | ask | show | jobs
by danielmarkbruce 460 days ago
The LLM embeddings for a token cover much more than semantics. There is a reason a single token embedding dimension is so large.

You are conflating the embedding layer in an LLM and an embedding model for semantic search.

1 comments

I don't think we're using the term semantic in the same way. I mean "relating to meaning in language."
The embedding layer in an llm deals with much more than the meaning. It has to capture syntax, grammar, morphology, style and sentiment cues, phonetic and orthographic relationships and 500 other things that humans can't even reason about but exist in words combinations.
I'll give you that. I was including those in "semantic space," but the distinction is fair.

My original point still stands: the space you've described cannot capture a full image of human cognition.