| The tables of TLD frequency on page 4 of the stats report are interesting, though it causes some confusion to me about how the crawler actually crawls and when it stops: https://docs.google.com/file/d/1_9698uglerxB9nAglvaHkEgU-iZN... Table 2a purports to show the frequency of SLDs: 1 youtube.com 95,866,041 0.0250 2 blogspot.com 45,738,134 0.0119 3 tumblr.com 30,135,714 0.0079 4 flickr.com 9,942,237 0.0026 5 amazon.com 6,470,283 0.0017 6 google.com 2,782,762 0.0007 7 thefreedictionary.com 2,183,753 0.0006 8 tripod.com 1,874,452 0.0005 9 hotels.com 1,733,778 0.0005 10 flightaware.com 1,280,875 0.0003 If I'm reading this correctly, it seems that the crawler managed to hit up a huge number of youtube video pages...but only a fraction of them. I couldn't find a total number of Youtube video count, but Youtube's own stats page says 200 million videos alone have been tagged with Content-ID (identified as belonging to movie/tv studios). In any case, it's surprising to not see Wikipedia on there. English wikipedia has 4+ million articles, so it should be ahead of thefreedictionary.com |