| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by numlocked 287 days ago

I don’t quite understand. The article says things like:

“With the constant upward pressure on embedding sizes not limited by having to train models in-house, it’s not clear where we’ll slow down: Qwen-3, along with many others is already at 4096”

But aren’t embedding models separate from the LLMs? The size of attention heads in LLMs etc isn’t inherently connected to how a lab might train and release an embedding model. I don’t really understand why growth in LLM size fundamentally puts upward pressure on embedding size as they are not intrinsically connected.

3 comments

indeed30 287 days ago

I wouldn’t call the embedding layer "separate" from the LLM. It’s learned jointly with the rest of the network, and its dimensionality is one of the most fundamental architectural choices. You’re right though that, in principle, you can pick an embedding size independent of other hyperparameters like number of layers or heads, so I see where you're coming from.

However the embedding dimension sets the rank of the token representation space. Each layer can transform or refine those vectors, but it can’t expand their intrinsic capacity. A tall but narrow network is bottlenecked by that width. Width-first scaling tends to outperform pure depth scaling, you want enough representational richness per token before you start stacking more layers of processing.

So yeah, embedding size doesn’t have to scale up in lockstep with model size, but in practice it usually does, because once models grow deeper and more capable, narrow embeddings quickly become the limiting factor.

link

numlocked 287 days ago

I hear you, but the article is talking specifically about "embeddings as a product" -- not the embeddings that are within an LLM architecture. It starts:

> As a quick review, embeddings are compressed numerical representations of a variety of features (text, images, audio) that we can use for machine learning tasks like search, recommendations, RAG, and classification.

Current standalone embedding models are not intrinsically connected to SotA LLM architectures (e.g. the Qwen reference) -- right? The article seems to mix the two ideas together.

link

gojomo 287 days ago

The LLMs need the embedding function, benefit from growth, do the training – and then other uses get that embedding "for free".

So an old down-pressure on sizes – internal training costs & resource limits – now weaker. And as long as LLMs are seeing benefits from larger embeddings, they'll become more common and available. (Of course via truncation/etc, no one is forced to use larger than works for them... but larger may keep becoming more common & available.)

link

svachalek 287 days ago

All LLMs use embeddings, it's just for embeddings models they stop there, while for a full chat/completion model that's only the first step of the process. Embeddings are coordinates in the latent space of the transformer.

link