Not a bad approach to contextually identify similar content from different tweets.
Wondering if similar approaches could be used to identify fake news by comparing their contextual similarity with an existing, known database of fake news.
1. Fresh News constantly keeps cropping up. Its a dynamic problem, not something that can be done on a static dataset.
2. Verification is a harder problem as compared to clustering. One of the problems is what data to treat as ground truth for verification. Technologically it is composed of hard problems : finding out each individual fact in a document and then verifying them from close facts from ground truth article.