| Is it possible to write a script to scan all the internet (or at least the popular websites) and determine which ones are blocked? If you can find or make a list of websites you want to scan, you can script it. The biggest problem is doing it in a way that doesn't bring you to the attention of those doing the blocking. 1. Where can I find a list of all domain names, top 1000, top 100000? Alexa's "top 1,000,000" list (~10.2 MB download) is at http://s3.amazonaws.com/alexa-static/top-1m.csv.zip 2. Is it possible to conclusively determine censorship from headers only or do I have to load the whole page and compare HTML code with a sample? Bandwidth is very expensive here. It depends on the method used to block you from visiting a website. If DNS-based blocking is used, you can use very small DNS lookups to identify whether or not a website is blocked — all of the hostnames of blocked websites will probably resolve to the same IP address. (You can check this with "nslookup www.website.com" in Windows or "host www.website.com" on Linux, OS X, etc.) If this method works, it's probably the best way — DNS requests are less likely to be logged than HTTP requests, and DNS requests and responses are small. If the blocking uses a transparent proxy instead of forged DNS records, you could use HTTP HEAD requests and match against the "Server" header in the reply: Server: Apache/2.2.12 (Unix) mod_ssl/2.2.12 OpenSSL/0.9.7d mod_wsgi/3.2 mod_perl/1.29 PHP/4.4.1
The software listed in that "Server" header is terribly old, and I doubt you'll find any other web server on the Internet with that exact combination of software versions. So that could be a way to identify the server serving the "website blocked" page without downloading entire pages, but it might draw attention to you if you do it for thousands of websites. |
I think this is a HUGE issue that should not be taken lightly. A guy scanning certain websites from Iran IS going to attract some attention no matter how benign his motives. It just won't be taken lightly. That attention can land you on lists you don't want to be on.
I'm not saying that I don't sympathize with his/her situation... I just think that certain actions can be viewed by people with a security mindset as hostile. Indeed it may only increase the number of sites being blocked. As well as, SEVERELY restricting his/her ability to travel without being arrested. And if you attract enough of the right attention... you may find that being arrested is the least of your worries.
And all of this doesn't even take into account what Iranian authorities may do from their end.
Advice like this, given on a public forum via easily identifiable pseudonyms, should be taken with a BIG grain of salt.