Hacker News new | ask | show | jobs
by 131012 3303 days ago
Would you care to elaborate?

My main doubt comes the fact that language meaning may vary between different contexts, but I am no expert and am earnestly curious about using NLP and ML with not-that-big data.

2 comments

Vector representation are useful for many natural language processing tasks. In word embedding like word2vec, GLoVE, fasttext for each word the algorithm learns an associated vector in dimension n (x1, x2, .., xn). A word maybe close to another in one dimension (or a subspace) but far away in another one. Moreover good representation allows meaningful vector space arithmetic: Queen - Women == King word representations are typically trained on very large unlabeled data, but once the algorithm learns the features you can use them for your small dataset. EDIT: Add more explanations.
This blog post[1] is a great example of positive unintended consequences of deep learning:

> We were very surprised that our model learned an interpretable feature, and that simply predicting the next character in Amazon reviews resulted in discovering the concept of sentiment.

[1]: https://blog.openai.com/unsupervised-sentiment-neuron/