|
|
|
|
|
by rspeer
4022 days ago
|
|
I would insist on a better dataset before really calling these "semantic analogies" (and don't just take my word for it: Chris Manning complained about exactly this in his recent NAACL talk). The only semantics that it tests are "can you flip a gendered word to the other gender", which is so embedded in language that it's nearly syntax; and "can you remember factoids from Wikipedia infoboxes", a problem that you could solve exactly using DBPedia. Every single semantic analogy in the dataset is one of those two types. The syntactic analogies are quite solid, though. |
|
That's a simplification. E.g. I have trained vectors on Wikipedia dumps without infoboxes, and I queries such as Berlin - Deutschland + Frankreich work fine.
Of course, even the remainder of Wikipedia is nice text in that it will contain sentences such as 'Berlin is the capital of Germany'. So, indeed, it makes doing typical factoid analogies easier.
That said -- I am more interested in the syntactic properties :).