I'd argue that this is highly dependent on the type of data you scrape and the what you want to do with the data.
If you have a good data model the categorizing, storing and searching of the final result the isn't a big problem and the scraping is the complicated part. If you don't have a specific kind of resource you are scraping and just dump everything into some storage solution with no structure that's going to be the hard part while scraping is the easy part.
In theory, say you want to index one billion (10^9) web sites. Using modern hardware, you should be able to crawl, 10,000 web pages per second, which would take ca 30 hours, and if you save 1kb of text from each web site, that would be ca 1 TB of data. Doing a text search of 1TB of text would take some time though, maybe minutes. You could partition the data between servers though.
If you have a good data model the categorizing, storing and searching of the final result the isn't a big problem and the scraping is the complicated part. If you don't have a specific kind of resource you are scraping and just dump everything into some storage solution with no structure that's going to be the hard part while scraping is the easy part.