| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by danShumway 2418 days ago

> If instead, the search algorithm loops through data in a cached index server, that's no longer "search in a distributed way" that the gp was originally wondering about.

So DNS isn't distributed because my computer caches queries?

I think this is arguing semantics rather than practicalities. Centralization isn't binary -- it's a continuum, and we care about it because of the benefits it provides, not because we think it's an end in and of itself. What we care about is the ability to aggregate search results from multiple places, to bypass search if we have a specific video URL that's being shared, and to build our own search engines without running into copyright problems.

If all of those goals can be accomplished with a caching server, then does anyone actually care if it's technically decentralized?

> So, the index server with the "good experience" as perceived by users will be the one that also includes the actual videos -- basically acts as a CDN -- and this emergent behavior of user preferences defeats the decentralized ideals of p2p video.

My reading of this argument is I might as well just host my blog on Medium, because Google search is just another point of centralization. And after all, for speed reasons users will prefer to use a search engine that hosts both the blog and the search results -- so eventually Google search is definitely going to lose to Medium anyway.

But of course Medium isn't going to unseat Google, because in the real world speed improvements are relative, and at a certain point users stop caring, or at least other concerns like range of accessible content and network effects begin to matter a lot more.

2 comments

synctext 2418 days ago

> Centralization isn't binary -- it's a continuum

It's both I would argue. Distributed systems professor here. My lab has been working on a "academically pure" distributed Youtube for 14 years and 7 months now. That means no central servers, no web portals, and no discovery website. Pure Peer-to-Peer and lawyer-proof hopefully. Distributing everything usually means developer productivity drops by roughly 95%. Plus half of our master-level students are not capable of significantly contributing. Decentralised==hard. This is something the "Distributed Apps" generation is re-discovering after the Napter-age Devs got kids/s

> All there needs to be done is to expose a static, daily generated JSON file that contains all videos on the instance.

Or simply make it real-time gossip. Disclaimer; promoting our work here. We implemented a semantic clustered overlay back in 2014 for decentralised video search, that could make it just as fast as Google Servers[1]. This year we finished implementing a real-time channel feed of Magnet links protocol + deployment to our users. Our 51k concurrent users ensure that we can simply re-seed a new Bittorrent hash with 1 million hashes, then everybody updates. Complete research portfolio, including our decentralised trust function [2].

> does anyone actually care if it's technically decentralized?

That is an interesting question. Our goal is real Internet freedom. In our case, logically decentralisation is a hard requirement. Our users often don't care. Caching servers quickly introduce brittleness into your architecture and legal issues.

[1]https://www.usenix.org/system/files/conference/foci14/foci14... [2]https://github.com/Tribler/tribler/wiki#current-items-under-...

link

jasode 2418 days ago

>So DNS isn't distributed because my computer caches queries?

Again, I'm not talking about a technical engineering component. I'm talking about users aggregate behaviors. Please see my other reply of how we seem to be talking at different abstraction levels.

>Centralization isn't binary -- it's a continuum, and we care about it because of the benefits it provides, not because we think it's an end in and of itself.

Right, but that's not what I'm arguing. I'm talking about centralization as a emergent phenomenon that bypasses the ideals decentralized protocols that the protocol's designers didn't intend.

>If all of those goals can be accomplished with a caching server, then does anyone actually care if it's technically decentralized?

I guess I don't understand the premise then because if that were true, why would the adjective "distributed" even be mentioned in the question "search/discovery in a _distributed_ way?" To me, something about distributed/decentralized as a characteristic in the technical implementation is very important to the person asking the question.

EDIT: here's another example of that type of "search without central indexing server" question: https://news.ycombinator.com/item?id=20282397

link

danShumway 2418 days ago

> I'm talking about users aggregate behaviors.

So am I.

For example, Github currently hosts the majority of Git repositories online, and I've heard people argue that this means Git isn't really decentralized, because the user behavior is to stick everything into a central repository on a central server. But when Microsoft bought Github, lots of people migrated to Gitlab, and (issues notwithstanding) it was easy for them to do so because of Git's distributed architecture. Git was decentralized enough that pivoting from a bad event was still way easier than it would have been with a different architecture.

When I talk about decentralization as a practical concern, I'm not worried about users aggregating around good services. I'm worried about whether the architecture supports moving away from or augmenting those services if something goes wrong in the future.

And what I mean when I talk about centralization as a continuum is that the social aggregated behaviors you're worried about are still strictly better under a PeerTube system than they are under a Youtube system -- so there's no point in bashing PeerTube just because it doesn't solve literally every problem.

If I'm removed from a centralized PeerTube indexing service, my video is still online under the same URL, and I can still point users at a different indexing service. If censorship becomes problematic or widespread, users will move to different indexes because the network lock-in of an indexer is less than the lock-in of a social platform. As far as speed concerns go, users can fall back on slower indexers only when fast ones fail. All of this is workable.

But if I'm removed from Youtube, I have to start over from scratch with a new URL on a different site with different features that doesn't play nicely with any of the existing tools or infrastructure.

> I'm talking about centralization as a emergent phenomenon

The emergent phenomenon you're talking about is that sometimes better, faster services have more users than bad services. That's not a problem with decentralization, and that's not a problem decentralization is trying to solve. Decentralization is only trying to mitigate the harmful effects of that phenomenon.

It is not a desirable goal of decentralization to make every node in a graph have the same traffic levels -- and I mean that both on a technical and on a cultural level.

link

jasode 2418 days ago

>When I talk about decentralization as a practical concern, I'm not worried about users aggregating around good services. I'm worried about whether the architecture supports moving away from or augmenting those services if something goes wrong in the future.

I understand your point here but this sounds more like a technical detail and not about social power structure. To your point, I'd also say the combination of DNS and http protocols already allow for people to move their content around the internet (keep the same url) and yet people do care about aggregation around platforms because they don't like concentration of power. So even though you state you don't worry about it, others do. I believe reducing platform power is part of the motivation for p2p video.

>And what I mean when I talk about centralization as a continuum is that the social aggregated behaviors you're worried about are still strictly better under a PeerTube system than they are under a Youtube system -- so there's no point in bashing PeerTube just because it doesn't solve literally every problem.

Btw, I'm not "bashing" Peertube. Instead, I'm trying to emphasize that it would be a mistaken belief to think that a p2p video protocol can stop defacto centralization. (E.g. see history of http protocol on why that doesn't happen.) Instead of thinking about what's technically possible with cache index servers, we should think about what humans typically do that inadvertently recreates centralization that nobody seems to want. A quality cache index server can create a feedback loop that attracts both users and video uploaders which weakens decentralized p2p nodes. If that particular cache server's popularity doesn't really matter because p2p nodes will always be able to independently exist, then that means today we can also say that Youtube doesn't matter because you can already serve videos (AWS, Azure, home server) independently outside of Youtube.

>If I'm removed from a centralized PeerTube indexing service, my video is still online under the same URL, and I can still point users at a different indexing service. If censorship becomes problematic or widespread, users will move to different indexes because the network lock-in of an indexer is less than the lock-in of a social platform.

But people can make the same argument about Google's index search results. E.g. it doesn't matter if your blog or niche pet store got removed from the page 1 of the search results because you can theoretically point users to a different indexing service (Bing, or roll-your-own index ranking algorithm with Common Crawl dataset, etc). The content at the url domain you already own is still at that url. But we both know that answer (while true in a sense) does not satisfy people. Website owners get very upset when they lose ranking or get removed (censorship) from search results altogether. Even though there are technical solutions for people to not use "google.com", it's irrelevant when their mental framework is "power & influence" of Google.

>The emergent phenomenon you're talking about is that sometimes better, faster services have more users than bad services. That's not a problem with decentralization, and that's not a problem decentralization is trying to solve. Decentralization is only trying to mitigate the harmful effects of that phenomenon.

I think I disagree with that but let me expand. If the goal of decentralization is some diversity (e.g. some niche content has a place to serve video outside of Youtube) then your paragraph makes sense. However, if it's the more ambitious idea of "replace Youtube", then yes, it's a huge problem of decentralization that it can't be as fast/convenient/quality as centralized services for normal users. If most mainstream users are avoiding decentralized services because it "didn't solve problems it doesn't claim to solve" -- does it mean decentralization "succeeded"? I guess there's semantic wiggle room there.

>It is not a desirable goal of decentralization to make every node in a graph have the same traffic levels

I never claimed equal traffic was desirable and that seems to be an uncharitable reading of my points.

link

boomlinde 2417 days ago

> Instead of thinking about what's technically possible with cache index servers, we should think about what humans typically do that inadvertently recreates centralization that nobody seems to want.

The comment you link to above makes a technical argument. It asserts what you believe is and isn't technically possible. In that sense I feel like you are moving the goal posts.

link