Hacker News new | ask | show | jobs
by judk 4641 days ago
Word2vec seemed intuitively obvious me, but I really have a hard time believing that it works in only 1000 dimensions, generating results beyond cherry picked demo examples.

Are there really only 1000 independent concepts in the English language?

3 comments

No but with n binary dimensions (with value 0 or 1) you can encode 2^n unique identifiers.

So with 1000 continuous dimensions (typically values between -1 and 1 coded on 32 bit floats) you can encode quite a bunch of concepts and their nuances.

Note: the default dimensionality of word2vec is 100 instead of 1000. Apparently you can get better results with dim=300 and a very large training corpus. To leverage higher dimensions you need: more CPU time to reach convergence and a lot more data to leverage the added model capacity.

I'm still impressed it only takes 26 letters, in words of average size around 5! By comparison, 1000 continuous dimensions seems positively resplendent with expressiveness.

FWIW, 2^61 > 26^5, so even the binary vector 2^1000 has an expressive space about 2^939 times larger than 26^5 (all possible words up to 5 letters).

Yes, but there are exponentially more concepts than words. The words we have are sparse set of labels for particularly relevant combinations.

But yeah, the continuous dimensions can hide many more binary dimensions.

For example, 4-D rgba can be smashed into 1 continuous (or 64-bit) dimension, but that feels a bit like cheating.

So it sort of feels like 1000 64-bit dimensions is a tricky name. 64000 1bit dimensions.

I wouldn't be surprised if you cover most basic english with 1000 concepts. That would give a lot of combinations.