| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jeffbee 893 days ago
	> they need to provide non-blocking or near non-blocking performance within the availability zone I see you've never tried GCP.

1 comments

amluto 893 days ago

I have, but not for a use case where this matters.

FWIW, Google has been working on these fancy nonblocking networks for a very long time. They’re very proud of them. Maybe they don’t actually use them for GCP, but Google definitely cares about network performance for their own purposes.

link

jeffbee 893 days ago

The whole concept of blocking is inapplicable to packet-switched networks. The whole time I was there I never heard anyone describe any of their several different types of networks as non-blocking. Indeed, the fact that they are centrally-controlled SDNs, where the control plane can tell any participant to stop talking to any other participant, seems to be logically the opposite of "non-blocking", if that circuit-switching terms were applicable.

Your message seems to imply that these datacenter networks experience very little loss, and this is observably far from reality. In GCP you will observe levels of frame drops that a corporate datacenter architect would consider catastrophic.

link

dekhn 893 days ago

Blocking is a common concept in packet switched networks; for example, a packet switch with a full crossover can be called "non-blocking". A switch is either going to queue or discard packets, and at the rates we're discussing, there is not enough buffer space so typically if a switch gets overloaded it's going to drop low priority packets. Obviously many things have changed, there are ethernet pause frames and admission control and SDN management of routes, but we still very much use the term "blocking" in packet switched networks.

What google decided long ago is that for their traffic patterns, it makes the most sense to adopt clos-like topologies (with packet switching in most cases), and not attempt to make a fully non-blocking single crossbar switch (it's too expensive for the port counts). More hops, but no blocking.

Scaling that got very difficult and so now many systems at Google use physical mirrors to establish circuit-like paths for packet-like flows.

GCP is effectively an application that runs on top of google's infrastructure (I believe you already worked there and are likely to know how it works) that adds all sorts of extra details to the networking stack. For some time the network as a user-space Java application that had no end of performance problems.

link

jeffbee 893 days ago

The whole word smacks of Bellhead thinking. With ethernet you put a frame on the wire and hope.

link

amluto 893 days ago

The term “non-blocking” may well originate with circuit switching, but Ethernet switches have referred to the behavior of supporting full line rate between any combination of inputs and outputs as “non-blocking” for a long time. (I wouldn’t call myself an expert on switching, but I learned IOS before there was a thing called iOS, and I think this usage predates me by a decent amount.)

> With ethernet you put a frame on the wire and hope.

This is not really true. With Ethernet, applications and network stacks (the usual kind — see below) ought to do their best to control congestion, and, subject to congestion control, they put frames on the wire and hope. But network operators read specs and choose and configure hardware to achieve a given level of performance, and they expect their hardware to perform as specified.

But increasingly you can get guaranteed performance on Ethernet even outside a fully non-blocking context or even performance exceeding merely “non-blocking”. You are fairly likely to have been on an airplane with controls over Ethernet. Well, at least something with a strong resemblance to Ethernet:

https://en.m.wikipedia.org/wiki/Avionics_Full-Duplex_Switche...

There are increasing efforts to operate safety critical industrial systems over Ethernet. I recall seeing a system to allow electrical stations to reliably open relays controlled over Ethernet. Those frames are not at all fire-and-hope — unless there is an actual failure, they arrive, and the networks are carefully arranged so that they will still arrive even if any single piece of hardware along the way fails completely.

Here’s a rather less safety critical example of better-than-transmit-and-hope performance over genuine Ethernet:

https://en.m.wikipedia.org/wiki/Audio_Video_Bridging

(Although I find AVB rather bizarre. Unless you need extremely tight latency control, Dirac seems just fine, and Dirac doesn’t need any of the fancy switch features that AVB wants. Audio has both low bandwidth and quite loose latency requirements compared to the speed of modern networks.)

link

jeffbee 892 days ago

> With Ethernet, applications and network stacks (the usual kind — see below) ought to do their best to control congestion

Exactly. Network endpoints infer things about how to behave optimally. Then they put their frame on the wire and hope. The things that make it possible to use those networks at high load ratios are in the smart endpoints: pacing, entropy, flow rate control, etc. It has nothing at all to do with the network itself. The network is not gold plated, it's very basic.

link

dekhn 893 days ago

That's an exceptional simplification of modern network approaches.

if the world was bellhead, ATM would have won.

link

danpalmer 892 days ago

I think what the parent comment is implying is that at Google a lot of work has been put into making network calls feel as close to local function calls as possible, to the point that it’s possible to write synchronous code around those calls. There are a ton of caveats to this, but Google is probably one of the best in the world at doing this. There is some truly crazy stuff going on to make RPC latency so low.

link