|
|
|
|
|
by emrgx
3429 days ago
|
|
Alchemy is doing all the NLP. Each article is extracted for concepts and entities (as defined by Alchemy in their documentation). I normalize each term that is extracted in order to prevent duplicates (there are some duplicates that still sneak through so it still requires a little bit of data maintenance). So the way this looks is that their is one node for a term say "Machine Learning." In one article "Machine Learning" is a concept with a negative sentiment and high relevance and another article it is an entity with low relevance but positive sentiment. The relationships house the sentiment and relevance properties: (machine_learning)-[relevance,sentiment]-(article). The suggested readings sections pulls the most relevant concept of that article and finds connected articles with the same concept at a high relevance. This way suggested articles are more than just key word hits. It's all about relevance. I'm still continuing to tweak this query and there's a lot more that can be done with it such as matching sentiment and emotion. As the dataset grows I'll look to add a feature that pulls a list of articles based on a cluster of highly associated entities. As for Alchemy, I've tried a number of different NLP APIs and, in my opinion, none of them have come close to matching Alchemy's accuracy. It does make mistakes but at a low enough level that it's easy to manually correct. |
|
How are you finding Neo4J is handling the scale of reading and writing all these stories? I've had a positive experience so far but I'm only in the few thousands range.