Hacker News new | ask | show | jobs
by michalg82 1129 days ago
I'm using those vectors, which latest version is from 2019:

https://github.com/commonsense/conceptnet-numberbatch

I guess data used for making those vectors doesn't contain many occurrences of those two words in relation.

Anyway, that's downside of word vectors idea. There always will be some words which we human will consider more or less related than word vectors.

I've tried finding best one. It's different what Semantle uses (word2vec from Google) and different what Contexto uses (Glove). But still there are probably many word pairs which could match better.

1 comments

What about using the those three models and returning the best score between them?
That's some really interesting idea. But what if it will make too many "false positives"? Maybe too many word pairs will be considered more related that one could expect.