Hacker News new | ask | show | jobs
by andreasvc 2789 days ago
You're right that it is a representation, but also an instance of the vector space model of language. Coupled with a linear model for prediction it is a strong baseline for text classification problems. See e.g. http://scikit-learn.org/stable/tutorial/text_analytics/worki...
1 comments

So "bag of words" = "count/tf-idf vectorizer + logistic/ridge/lasso regression"?

Also: a vector space is a set of things that can be added and multiplied by a scalar. So a vector space model should be the proverbial representation where "queen - woman + man = king".

Am I being an insufferable pedant? I follow text analysis only very lightly and keep losing the thread.

Yes, in the context of text classification a bag of words model will refer to that, or combined with some other linear model like linear SVM or naive bayes.

The queen - woman example is when you try to make a model of word semantics, such as with word2vec. In a document classification task the vectors represent documents.