|
|
|
|
|
by jsnell
2159 days ago
|
|
How do you know which copies are bogus? It can't be just by saying that the one you have the most copies of is the right one. The problem is that most legit copies will be subtly different. While an attacker trying to forge page contents can make their copies identical. You can't do fuzzy matching when deciding what to store since that would require all be the nodes to agree on the fuzzy matching algorithm. That's going to mean hard-coding a complex algorithm that requires constant updates into your Blockchain infra. A proof of work does not seem viable either. You're asking for the submitters to pass it for no reward, so the difficulty factor can't be particularly high. But then it becomes useless at blocking somebody who is actually deriving a benefit from submitting (fake) results. The giant company will in this case build an index that's far superior. The crowd-sourced version will have huge amounts of duplication of popular pages, and massive underrepresentation of the long tail. And can you imagine how inefficient the distributed version will be both on storage and bandwidth. There can't be any facility for scheduling pages to be crawled at sensible intervals given the push model. The indexing nodes will just be flooded with pages they didn't actually want. The crowd-sourced version will also not be "random people" like you suggested. A lot of them will have an agenda, and will be trying to manipulate the index to meet that agenda. And manipulate it in a way that's not useful to the people making searches. At least the company's goal of making money is furthered by building as useful an index as they can given the resource constraints. |
|
The search engine page can be used for validation, just allow people pressing the back button on the page to tell you whether the results were useful or not.