| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by minimaxir 1657 days ago
	I don't think it's more of a final boss thing: IMO working with embeddings/word vectors is easier, even in the basest case such as word2vec/GloVe, to understand than some of the more conventional NLP techniques (e.g. bag of words/TF-IDF). The spaCy tutorials in the submission also have a section on word vectors.

1 comments

Vetch 1657 days ago

Ah, although, TF-IDF is still good to know. Semantic search hasn't eliminated the need for classical retrieval techniques. It can also be used to select a subset of words to use to create an average of word vectors for a document signature, a quick and dirty method for document embeddings.

Bag of word co-occurrences in matrix format is also a nice to know, factorizing such matrices were the original vector space model for distributional semantics and provide historical context for GloVe and the like.

mumblemumble 1656 days ago

> Bag of word co-occurrences in matrix format is also a nice to know, factorizing such matrices were the original vector space model for distributional semantics and provide historical context for GloVe and the like.

And also, IIRC, still outperforms them on some tasks.