|
|
|
|
|
by aantix
2530 days ago
|
|
The Common Crawl corpus is already available and stored on S3 - so analyzing billions of web pages is literally already available with an AWS account and a simple map reduce job. I'd actually advocate for making public an anonymized list of actual search queries. Domain specific search engines could evolved based on the demand of what has already been searched for. |
|