|
|
|
|
|
by hombre_fatal
263 days ago
|
|
Well, flesh it out more and it doesn't sound solved at all. How do you make it trustless. How do you fetch/crawl the index when it's scattered across arbitrary devices. How do you index the decentralized index. What is actually stored on nodes. When you want to do something useful with the crawled info, what does that look like. |
|
You'd figure out a replication strategy based on observed reliability (Lindy effect + uptime %).
It would be less "5 million flaky randoms" and more "5,000 very reliable volunteers".
Though for the crawling layer you can and should absolutely utilize 5 million flaky randoms. That's actually the holy grail of crawling. One request per random consumer device.
I think the actual issue wouldn't be the technical issue but the selection. How do you decide what's worth keeping.
You could just do it on a volunteer basis. One volunteer really likes Lizard Facts and volunteers to host that. Or you could dynamically generate the "desired semantic subspace" based on the search traffic...