Search engines are such a big thing they should be open sourced and distributed over the community. It's like the most basic infrastructure the internet needs to work and we are outsourcing it.
Obviously, not at the same level as google and there are other parts. But I believe we can do this together if we try to. People were talking about building their own search engine on elixir forums a while ago and many seemed interested.
I was thinking exactly this when I stumbled upon your comment, except I figured it should work for any private tab and it'd also need a browser that makes tabs private (and contained) by default.
It's a solution more easily solved by vc companies or government laws, because we're not seeing Google doing that in this lifetime, while FOSS solutions simply won't get the needed traction.
The same thing that happens when a peer accesses an illegal torrent on his country? How is this relevant? It is a decentralized system, it shouldn't make a difference.
So, couldn't you keep a database? Many people would upload the same url, whichever ones are bogus would get a low score, like shadowbanning. Say, use a dht with proof-of-work and things should work? Obviously I'm oversimplifying, but I see it as a solved problem by using a blockchain.
Also, ethically speaking, aren't we at the point of considering the idea of trusting random people smarter than trusting huge corporations whose only goals are to make more and more money?
How do you know which copies are bogus? It can't be just by saying that the one you have the most copies of is the right one. The problem is that most legit copies will be subtly different. While an attacker trying to forge page contents can make their copies identical. You can't do fuzzy matching when deciding what to store since that would require all be the nodes to agree on the fuzzy matching algorithm. That's going to mean hard-coding a complex algorithm that requires constant updates into your Blockchain infra.
A proof of work does not seem viable either. You're asking for the submitters to pass it for no reward, so the difficulty factor can't be particularly high. But then it becomes useless at blocking somebody who is actually deriving a benefit from submitting (fake) results.
The giant company will in this case build an index that's far superior. The crowd-sourced version will have huge amounts of duplication of popular pages, and massive underrepresentation of the long tail. And can you imagine how inefficient the distributed version will be both on storage and bandwidth. There can't be any facility for scheduling pages to be crawled at sensible intervals given the push model. The indexing nodes will just be flooded with pages they didn't actually want.
The crowd-sourced version will also not be "random people" like you suggested. A lot of them will have an agenda, and will be trying to manipulate the index to meet that agenda. And manipulate it in a way that's not useful to the people making searches. At least the company's goal of making money is furthered by building as useful an index as they can given the resource constraints.
The search engine page can be used for validation, just allow people pressing the back button on the page to tell you whether the results were useful or not.