Hacker News new | ask | show | jobs
by 1vuio0pswjnm7 2101 days ago
Also, there are alternative, publicly accessible ways to get most of this public zone file data now, so I am not sure that restriction in the access agreement is anything more than an historical artifact at this point.

You could use publicly available scan data for ports 80 and 443 to pare down the list of "websites".

The goal of exposing the non-popular web is worthwhile.

2 comments

You could port scan the entire IPV4 address space(minus all reserved addresses), send a GET request to everyone that responds, filter for valid HTML. It would take no more than 5 hours on a shitty PC, a lot less if you get a small aws instance.
Most non-major sites are on shared hosting. Without a host name, you won't get anything useful unfortunately.
Most major site are on shared hosting. (Sadly)
Thanks, I appreciate the feedback!