|
|
|
|
|
by amitamb
2616 days ago
|
|
Apart from that Common Crawl respects robots.txt (which makes sense) so many sites you expect to see there are not indexed. Netflix, Facebook LinkedIn and many more. If common-crawl sees serious adoption those sites will modify their robots.txt but it's and chicken/egg problem. |
|