If you wanted it to know that "machine learning" and "neural networks" were related, wouldn't you need to do some type of entity extraction first, since Word2vec is run on tokens?
from gensim.models.phrases import Phrases
bigrams = Phrases(corpus)
or you could rank bigrams by count(w1+w2)^2/(count(w1)*count(w2))
many variations on this formula work, but the idea is to compare the count of the bigram to the counts of the unigrams.
By the way, you do bigram identification before Word2Vec to have specialized vectors for bigrams as well.
Besides this method, there is one great way to identify ngrams: use Wikipedia titles. It's quite an extended list that covers most of the important named entities, locations and multi-word topic names, or go directly to http://wiki.dbpedia.org/ for a huge list with millions of ngrams. Cross reference it with your text corpus and you get a nice clean list.
many variations on this formula work, but the idea is to compare the count of the bigram to the counts of the unigrams.
By the way, you do bigram identification before Word2Vec to have specialized vectors for bigrams as well.
Besides this method, there is one great way to identify ngrams: use Wikipedia titles. It's quite an extended list that covers most of the important named entities, locations and multi-word topic names, or go directly to http://wiki.dbpedia.org/ for a huge list with millions of ngrams. Cross reference it with your text corpus and you get a nice clean list.