Hacker News new | ask | show | jobs
by deviate_X 3520 days ago
It seems to be quite opinionated

https://concept.research.microsoft.com/Home/Demo?instance=hi...

it would be interesting to know more about how the graph is formed, and how it avoids "gaming" the engine

the probase link is giving be a 400 error

2 comments

My guess is it only parses certain word forms. "Blatant state-shtuppers" is in this blog post: http://www.transterrestrial.com/?p=63723

> "Let’s put blatant State-shtuppers such as Hillary, Bernie, and Obama at about 7 or an 8."

This matches Hearst Pattern #1 from https://www.microsoft.com/en-us/research/wp-content/uploads/...:

> NP such as {NP,}*{(or, and)} NP

Hillary usually appears by herself, rather than in a list. Apparently Probase doesn't pick up the plentiful "X is a Y" associations, e.g. the "Hillary is a liar" from http://thefederalist.com/2015/08/27/poll-voters-overwhelming... or "Hillary is a candidate" from http://www.huffingtonpost.com/jeffrey-sachs/hillary-is-the-c...

Or maybe it does, and they're ranked down. They do have a truth-detection phase, but it's mostly syntactic, and the top categories all have negative examples ("Hillary is not a candidate", "Hillary is not a democrat", etc.).

Wow, those are rather interesting "concepts". It's surprising most of the top results are all totally subjective (and yes highly opinionated): 'unrepentant liar', 'gas bag', 'ruthless and totally corrupt politician'...

Clearly those associated concepts didn't come from the nytimes or wikipedia, so how can they ensure accuracy when scraping these unauthoritative sources?