| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by quentinp 4698 days ago

You can think of this as a square matrix W. The size of the matrix is the size of the vocabulary. If we look at the 100k most frequent words in our corpus, W will be a 100k x 100k matrix.

The value of W(i,j) is the distance between words i and j, and a row of the matrix is the vector representation of that word. Research around word vectors is all about computing W(i,j) in an efficient way that is also useful in natural language processing applications.

Word vectors are often used to compute similarity between words: since words are represented as vectors, we can compute the cosine angle between a given pair of words to find out how similar the two words are.

1 comments

rcfox 4698 days ago

Does that mean there actually is an answer for "What do you get when you cross a mosquito with a mountaineer?"

link

SergeyHack 4696 days ago

TL;DR: The answer to your query is a person named Chaudhry Sitwell Borisovich who is definitely an entomologist-hymnist and probably is also a mineralogist-ornithologist.

A google search suggests that he was born in 1961.

I ran a few queries using the code and its default dataset, trying to use neutral words for substraction: "mosquito -small +mountaineer", "mosquito -big +mountaineer", "mosquito -loud +mountaineer", "mosquito -normal +mountaineer", "mosquito -usual +mountaineer", "mosquito -air +mountaineer", "mosquito -nothing +mountaineer".

The most frequent words for these queries are:

6 times: "borisovich" "chaudhry" "entomologist" "hymnist" "sitwell"

5 times: "mineralogist" "ornithologist"

Well done, sir.

Well you can dot them if not cross them (cross needs 3 dimensions iirc)!

link

defen 4698 days ago

You inadvertently stumbled onto the punchline of the joke - "You can't cross them because a mountaineer is a scalar." (scaler) - works better when spoken.

link

sp332 4697 days ago

Don't forget the part about the mosquito being a vector :)

link