|
I built Drewes.NEWS as a way to learn NLP and serverless architecture, and to find bias in news by reading the same story from multiple outlets. Now it's evolved into a useful, privacy-focused tool. It's privacy-focused in that there are no cookies, no usage of Google or Facebook components (like Google Analytics or Ads). No data tracking on users whatsoever. There are bugs I'm aware of, but am looking for feedback on if this format and function is useful. For those interested in the NLP side of this or the serverless side, I'd be happy to answer questions about how it was put together. The short version is, I pull down RSS feeds from 33 news sites (approximately 1M stories in the database so far), store them, create a term frequency model, cluster the most recent 10k stories based on TF similarity vectors, then store story similarities. In the future I'd like to add paging, searching, and more filtering. I'm also thinking about having a URL for each story, that would show all the similar articles. That way, if you don't want to link directly to a particular news source, you can link to the drewesnews aggregate URL, and the reader can pick whichever source he/she wants to read. Any feedback would be much appreciated! |