|
|
|
|
|
by zozbot234
2349 days ago
|
|
The Common Crawl is a thing already. Unfortunately, a "full" text crawl of the internets is a YUUUGE amount of data to manage, and I can't think of anything that could change that in the foreseeable future. That's why I think providing a federated Web directory standard, ala ODP/DMOZ except not limited to a single source, would be a far more impactful development. |
|
Maybe instead of a problem, there is an opportunity here.
Back before Google ate the intarwebs, there used to be niche search engines. Perhaps that is an idea whose time has come again.
For example, if I want information from a government source, I use a search engine that specializes in crawling only government web sites.
If I want information about Berlin, I use a search engine that only crawls web sites with information about Berlin, or that are located in Berlin.
If I want information about health, I use a search engine that only crawls medical web sites.
Each topic is still a wealth of information, but siloed enough that the amount of data could be manageable to a small or medium-sized company. And the market would keep the niches from getting so small that they become useful. A search engine dedicated to Hello Kitty lanyards isn't going to monetize.