I know a lot of people are asking for this, but it would literally fill up this whole page listing all the URLs.
Best way I can describe it is "publicly allowable URLs on the web" which would include blogs, forums, social networks, websites, and more. If we can pick up the HTML/RSS/ATOM/Json/text.. then we try to get it.
We don't scrape any sites that disallow it in their robots txt and we don't scrape material only available behind logins and paywalls.
Best way I can describe it is "publicly allowable URLs on the web" which would include blogs, forums, social networks, websites, and more. If we can pick up the HTML/RSS/ATOM/Json/text.. then we try to get it.
We don't scrape any sites that disallow it in their robots txt and we don't scrape material only available behind logins and paywalls.