Hacker News new | ask | show | jobs
by draugadrotten 4523 days ago
1. Where can I find a list of all domain names, top 1000, top 100000?

Alexa http://www.alexa.com/topsites could provide you with data which is for the "top 500".

1 comments

I know about Alexa, but 500 is too small for statistical analysis.
Look for the link there to download a list of the top million domains (according to them, of course).

Edit: http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

Thanks.
If the blocklist is manually curated then the probability of a website being blocked will depend on its popularity. I wouldn't just be interested in "X% of sites blocked," I'd look at "Sites seeing Y% of web traffic blocked" etc.
It is a combination of manual and automatic blocking. Facebook censorship is manual. Dick Cheney Wikipedia page being blocked is because they have added Dick to their automatic blacklist, so it gets censored regardless of the context.
So you can't connect to Wikipedia using HTTPS? What's the policy on HTTPS in general?

Edit: Never mind, you already answered it in another comment.

Alexa also has Top 1000000 sites, updated daily:

http://s3.amazonaws.com/alexa-static/top-1m.csv.zip