| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by minimaxir 710 days ago
	One of the now-underdiscussed features of embeddings is that you can indeed use any existing statistical modeling techniques on them out of the box, and as a bonus avoid the common NLP preprocessing nuances and pitfalls (e.g. stemming) entirely. This post is a good example on why going straight to LLM embeddings for NLP is a pragmatic first step, especially for long documents.

1 comments

throw10920 710 days ago

You can apply statistical techniques to the embeddings themselves? How does that work?

link

mkl 710 days ago

You can apply statistical techniques to anything you want. Embeddings are just vectors of numbers which capture some meaning, so statistical analysis of them will work fine.

link

throw10920 710 days ago

Don't most statistical techniques rely on specific structure in the spaces containing the objects they operate on, in order to be useful?

link

mkl 710 days ago

Embeddings have structure, or they wouldn't be very useful. E.g. cosine similarity works because (many) embeddings are designed to support it.

link

throw10920 710 days ago

Oh, that should have been obvious. Thank you for explaining.

link