Hacker News new | ask | show | jobs
by owenshen24 2574 days ago
Even as the person who wrote the scraping code for the original project, I'm a little suspect myself of the news data because of how similar the two distributions turn out to be.

I think the strong similarity is an artifact of our data collection process, rather than reflecting some very deep truth about the similarity of the two sources. Or of sources in general. My priors are that the distribution should have looked more different, but I just didn't do extra verification at the time.