|
|
|
|
|
by rhmw2b
2551 days ago
|
|
Google's paper on Percolator from 2010 says there are more than 1T web pages. 9 years later there is surely way more than that. https://ai.google/research/pubs/pub36726 The real issue would be crawling and indexing all those pages. How long would it take for an average user's computer with a 10Mb internet connection to crawl the entire web? It's not as easy a problem as you make it seem. |
|
I have a gigabit link to my apartment (go Swedish infrastructure!). At that theoretic speed I get 450 gigs an hour, so I could download ten tera in a day. We can easily slow that down by an order of magnitude and its still a very viable thing to do. If someone wrote the software to do this, one could imagine some kind of federated solution for downloading the data, so that every user doesn't have to hit every web server.