Hacker News new | ask | show | jobs
by stochtastic 911 days ago
This is a fun dataset. The paper leaves a slight misimpression about channel statistics: IIUC, they do not correct for sampling propensity to reweight when looking at subscriber counts (it should be weighted ~1/# of videos per channel since the probability of a given channel appearing is proportional to the number of public videos that channel has, as long as the sample is a small fraction of the population).
1 comments

I noticed that too. Seems very unlikely that 1,000,000 subscribers represents the 98th percentile and not the 99.999th.