Hacker News new | ask | show | jobs
by benkant 4022 days ago
I had a rant about this on Twitter yesterday[0].

Google makes money precisely because the web is centralised. If we moved to P2P systems they could still provide an index, but I'd wager far less data would ultimately pass through them. Not to mention that if people were more in control of their data rather than LinkedIn, Facebook, Flickr, YouTube, I bet they'd be less inclined to having it indexed publically simply because they have a choice.

There's all sorts of network effects and shitty incentives at play, and it's a shame.

My twitter rant (not the whole thing even)

> though they have their place, centralised systems reinforce the role of the middle man, which is prime for rent-seeking and lopsided value

> in reality networks exist on a continuum between centralised/decentralised. The web makes it difficult to choose the correct degree per case

> both centralised and decentralised systems have trust issues, but they are different in kind not magnitude

> the incentives are wrong for innovation. Google requires centralised, so HTTP is fine. Ubiquity, so HTML is fine

> web developers have spent years becoming skilled in their corner and are incentivised to defend and perpetuate the platform

> if you think discoverability, zero-install and sandboxes are only possible on the web, I invite you to consult the literature

> we could have decentralised, secure, simple, efficient primitives, but network effects and incentives steer us away

> tech solutions are moot unless they incentivise behaviour that leads to better returns for everyone. layers on HTML/HTTP will never do that

[0] https://twitter.com/benkant

2 comments

"Google makes money precisely because the web is centralised"

I see it exactly the other way: Google makes money precisely because the web is decentralised, and hence you can create an invaluable service by crawling it and creating a centralised search index.

That's another way to look at it, but you're using decentralised to indicate there are many large nodes- I would say that's simply distributed. I'm using centralised to indicate that communications are via those large nodes, which mostly don't communicate with each other.

It's client/server on a massive scale. Just because the servers are public, doesn't make it decentralised.

In what way do you think the web is centralized?
That's a good question which deserves an honest answer but this comment box is really too small for that and essay sized comments are frowned upon.

For starters: navigating the web in the beginning consisted of clicking links which caused you to go from one website to another. This all worked well when (a) the web was small and (b) there were (hardly) no trash pages.

Search engines changed that, and once they got 'good enough' the link graph became a mere starting point for crawling the web rather than the way we navigated from site to site. For a little while the link graph was used as a popularity measurement but this too changed (because of the huge number of low value links).

Then we got silos. A 'silo' is a bunch of data locked up under a trade between users and large web properties. The trade is 'you give us your content and a bunch of information about yourself and we'll use that content to attract others and to sell ads'.

Examples of such silos are Google, Yahoo and Facebook.

Finally, if originally (and the internet itself) was strung together by a peer-to-peer approach it turned more and more into a division between producers and consumers, with the producers on the 'server' side and the consumers on the 'client' side.

Mobile devices accessing the net further accelerated this trend, right now the only internet (not web) applications that are still peer-to-peer are torrent applications. For the most part the division on the web is complete and hosting a web server on your very powerful cable modem or DSL line would be grounds for termination of your access.

Servers are hosted centrally and are operated by companies whereas clients are simply terminals that access the content stored on those servers.

I hope that answers your question in enough detail, you could easily write a book about this.

The internet is, by it's nature, peer to peer and decentralized. Cut a cable, or take out a large networks, the internet will route around it, either quickly (routes converging on a new peer) or slowly (a poorly connected network finding a new upstream to purchase connectivity through). That companies then build on top of this and implement services where they are the middle of both connections does not change this fundamentally, it just adds an optional layer. To assume our connections have upstream bandwidth that is never or rarely used is false. I would argue that we generate more content per-person than ever in history. The seer amount of pictures, videos, webcams, posts and comments is much higher than ever before. Are they hosting it directly from their connections? Usually not, but that's as much a case of being efficient and reaching an audience as it is in companies wanting control over the data. Even then, there are services which are decentralized from that, such as email. It's not efficient to host content yourself. Even the large networks use dedicated CDNs. For the end user, Facebook is a CDN.

That said, I agree there is a clear move towards our data and services being handled by fewer, and larger entities, such as Google, Yahoo, Microsoft, Apple, Amazon. But they aren't a single entity, and I don't consider that centralized. Any one of those providers could implode today, and very little of their services could not be picked up by some competitor easily. I don't consider that centralized.

We call them datacenters for a reason. When I received mail in '95 or so the machine receiving it was the workstation I wrote the reply on.

Your peer-to-peer view of the internet died roughly in '98.

> We call them datacenters for a reason.

And there are many of them, some owned by companies that use them exclusively, some conglomerations of many different providers but owned by yet another party. How is this centralization? I still think you're just arguing that we've compartmentalized certain services to sets of companies, for the most part, but even that isn't centralization, because there are multiple distinct companies using multiple distinct networks and in many cases they are presenting multiple distinct capabilities. Not having something handled at the end point does not mean it's centralized, there's a very large middle ground here, and that's where we are currently at. I'm not sure I see any evidence that we are moving away from that towards actual centralization.

> When I received mail in '95 or so the machine receiving it was the workstation I wrote the reply on.

And many people that used POP3 continued to do so well into the 2000's. It's silly to run a mail server on your workstation. I know, I did it for years myself. You run into all sorts of stupid problems related to your workstation not being always on, badly configured backup MX servers, and other issues. We don't do it anymore not because we were forced out of it (you can still do it now), but because there are solutions that are better for most use cases, and we opt for those.

We don't all wash our own cars, or do our own plumbing, or even clean our own houses. Some people do, some people pay others to do that work. The fact they pay others doesn't mean we've moved towards centralizing those services. There isn't some national bureau of plumbing that is our only recourse when the toilet is clogged and we don't want to fix it ourselves.

Ok. So you say we're not trending towards a more centralized internet because you discard all proof that that is exactly what is happening. That's fine with me but it really doesn't help to move the discussion forward.

The reasons why we are moving to a more centralized internet are what is interesting, such as - you rightly identified those - that stuff isn't always powered up and that keeping a mailserver up and running is work and so on.

But none of that changes that centralization is happening.

Multiple distinct companies != peer-to-peer internet. That's what a decentralized network infrastructure used to mean, where the 'peers' were equals.

Nowadays it means clients in one camp and servers in another, and large scale consolidation of those servers in the datawarehouses of a relatively low number of companies serving up the bulk of the data. If that trend continues it's not a bad or a good thing per-se but it would be good to stop and think about how desirable that is.

So from that point of view a lot of centralization has already happened.

Everybody running their own mailserver: could be a good thing, presuming they can be made easy to set up and easy to maintain (I don't see any technical reason why not). Ditto webhosting, why should facebook host all your content (or google, or Yahoo).

In the end, convenience won over 'peer-to-peer', there are many reasons besides convenience (firewalls, for one) but the results are here and we'll have to live with it (except for a couple of die-hard hold-outs).

I think the accurate statement of your opinion is not "the web is centralized" but rather, "Zipf's law sucks."

In decentralized networks there end up being accumulation points, and Zipf's law (which shows up in piles of different contexts, originally noticed in rank of words used in languages) gives a pretty good idea of how that accumulation plays out in basically an L-shaped curve. Point being that it might have a lot more to do with the structure of human networks and attention than with choice of wire protocols...

The nature of http and websites makes the web centralized: there's always a server, users don't really serve data, it's always stored somewhere.

It's true that it's decentralized, that's it's easy to create websites, but in nature, if you shut down dns servers, you shut down 99% of the internet, which inclues HTML website.

And I think that a decentralized web might be more easy to index (proof of work system, etc).

That's not centralized, it's just less decentralized. Centralized and decentralized or on opposite ends of the spectrum. It's possible to be less decentralized and still be very far from centralized. There are many, many different entities providing all sorts of services, so I'm not sure how that portion can be seen as centralized at all. DNS, as you not, is probably the most centralized single point that everything relies on, but they simply have authority because we give them authority. If DNS server adminitrators decided to use different root servers, there's not a lot they can do about that. But I'll concede that authoritative DNS is fairly centralized, given it requires checking with a single authority, but even then, man entities(TLDs) have a say in what that authority says (but not the ultimate say).
Well you're right, in nature and architecture the internet in decentralized, but the use most users make of it, is centralized.

If you look at what internet.org attempted to do, that's actually how the internet is used most of the time. For consumers and most small businesses, internet is centralized. Technically, most of the internet is just http requests, meaning that there will always be this duality of servers and clients. Without web servers and their admins, there is nothing, and that's a form of control in my opinion: you can easily shut down a website.

I still don't see that. A centralized internet, or event a centralized "web" as has been distinctly defined elsewhere here, implies a single authority. That doesn't exist, and I don't see it existing in the future. Which email provider do you want to use? Pick from hundreds. Which social network do you want to use? Pick from from the tens of candidates. Which blog platform do you want to use, pick from hundreds again.

> Without web servers and their admins, there is nothing, and that's a form of control in my opinion: you can easily shut down a website.

There are webservers, and admins. That hasn't changed. There's been a shift to larger sites, but there's still plenty of small ones. You sill have the options to put your site at many different locations, or use a platform such as Facebook, Blogger or Wordpress.

Look, here google is trying to solve the problem of government surveillance and security. Web servers are a very weak point because you can shut them down if you have the law on your side, and recently the law has been abusive. And even if you can change your DNS, the root servers are still an important part of the internet, and they're subject to control and legal issues. Control and authority makes those aspects of the internet centralized. This applies to your hundreds of mail and web providers, which are not free by the way (datacenters). Decentralized technologies are entirely free.

What I'm talking about, is protocols that make services impossible to shut down, like bittorrent or bitcoin. That's what I mean by a decentralized internet. Those technologies are different and were made especially with the goal of avoiding control, and they are exactly the solutions to breaches of privacy. Here every computer is equal, and that's a true decentralized internet, in term of hardware AND software. What I was talking about, is generalizing bitcoin and bittorrent to messaging or even hosting databases.

Such software would run on many domestic computers that want to use it and host chunks of data in a redundant manner. The issue is authenticity and signing of data. But other than that, that's where the future is.

I'm sorry but I can't trust the html/http web one bit. HTML and javascript are awful technologies, which are slow to parse, building web browsers have been a race that resulted in no interesting progress and the web2.0 has been a joke. All those techs have been the base google have been making its money on, which also makes easy to mine, so to me centralization is a privacy issue.

Also DNS servers are a pretty good example of centralized internet. Without 8.8.8.8 your browser turns clueless pretty quick:)
No, my browser doesn't. Google's public DNS has little bearing on how I reach sites, unless I've specifically configured it that way. Either you really don't understand how DNS works, or you are simplifying to the point of just plain being wrong.

You could argue that the root servers are too centralized, and that their control constitutes centralized DNS control, but since the only reason they have control is that all the different DNS servers use them as authorities, an argument could also be made that their control is more be convention than anything else, and all it would take is a competitor to ICANN that added some value, and eventually we could have multiple authorities. Whether that would be beneficial or detrimental is another discussion.

You could probably argue that Google's near-monopoly on search is a form of centralized control.
As much as people like to bandy that term about, I don't think of (less than) 68% of all searches as a monopoly. Two out of three people is a lot, but it's not nearly enough to force some sort of information control (whether that information is result, or other people exclaiming how much better their search engine is working).
It's distributed, but on the continuum of centralised - decentralised it is definitely centralised. How did my comment get from my computer to yours?
Through a complex interrelationship of distinctly controlled networks that advertise routes and addresses and allow traffic based on complex business relationships (peering). The only case where that's not happening is where we both have the same ISP, and ycombinator happens to be hosted there as well. Running a traceroute from myself to news.ycombinator.com, I count two distinct networks not including my local one, and not including cloudfare. If those networks stopped talking to each other, my packets to hacker news would find another route, assuming my first hop had access to other networks (given time for the networks to determine a new route and my first hop had access to other peers).
We're talking about the web as an application layer protocol. By your definition everything that happens on the internet is decentralised. That's not untrue if you look at it from the point of view of TCP/IP, but that's tangential to the conversation we're having.

You seem to be conflating the web with the internet.

But even by that definition, the web isn't a single application, it's many applications, some of them compartmentalized (search, social), some of them not (email), and some in between (websites/blogs). If an application were centralized, I would expect a single provider you had to use, but instead, where it at least compartmentalized, you have a group or providers. Can you name a single service/application that you expect more than 5% of people use that has only a single provider? For search, you have Google, Yahoo, Bing, and other smaller players. Google is dominant here, but still has less than 68% of the market. For social, Facebook is the dominant player, but you yourself used a different social network to communicate on this subject, and there are many other providers with popularity that ebbs and flows. It's the same with anything I can think of. I'm not sure how this is considered centralized under any definition.
Yep, the web is a distributed system. Yep, the web offers many services, and many providers offer the same class of service.

However, each and every one of those services are centralised in a technical sense on account of HTTP. Why might an alternative be useful? Consider the solution the Google service we're addressing is putting forward cf. Content Addressable Networking systems[0]. I can't spend any more time explaining, sorry. This might help- note the levels of centralisation in each generation of P2P systems:

https://www.cs.cmu.edu/~dga/15-440/F12/lectures/p2p-approxim...

[0] http://en.wikipedia.org/wiki/Content_addressable_network

To take Google as an example: 92% market share in Europe in 2014 [1], 81% of the global market for smartphones (Android) [2] - 96% if you also add the single relevant competitor iOS. None of this is technically centralisation. (And won't ever be, as you could always "decentralize" the web by running your own personal search engine on your home box. As long as someone is using it, google doesn't have 100% market share.) However, it doesn't make much of a difference when you want to develop an app that doesn't get accepted into the iOS or Android app store.

But all if this is obviously beside the point that the OP made. Even if you don't want to develop a search engine or a phone app, you still have to tie your users to a central "cloud" service and web site so you can get discovered by google. That's a huge disincentive for p2p services.

[1] http://uk.businessinsider.com/heres-how-dominant-google-is-i... [2] http://www.idc.com/getdoc.jsp?containerId=prUS25450615