Hacker News new | ask | show | jobs
by zeven7 1135 days ago
Great idea for a game!

A question I ran into while playing your game is why it says "Amazon" and "Prime" are only 3% related? That seems very surprising.

1 comments

I'm using those vectors, which latest version is from 2019:

https://github.com/commonsense/conceptnet-numberbatch

I guess data used for making those vectors doesn't contain many occurrences of those two words in relation.

Anyway, that's downside of word vectors idea. There always will be some words which we human will consider more or less related than word vectors.

I've tried finding best one. It's different what Semantle uses (word2vec from Google) and different what Contexto uses (Glove). But still there are probably many word pairs which could match better.

What about using the those three models and returning the best score between them?
That's some really interesting idea. But what if it will make too many "false positives"? Maybe too many word pairs will be considered more related that one could expect.