Hacker News new | ask | show | jobs
by jakedata 206 days ago
Visiting Bletchley Park and seeing step-by-step telephone switching equipment repurposed for computing re-enforced my appreciation for the brilliance of the telecommunication systems we created in the past 150 years. Packet switching was inevitable and IP everything makes sense in today's world, but something was lost in that transition too. I am glad to see that enthusiasts with the will and means are working to preserve some of that history. -Posted from SC2025-
1 comments

I wanted to learn more about computer hardware in college so I took a class called "Cybernetics" (taught by D. Huffman). I thought we were going to focus on modern stuff, but instead, it was a tour of information theory- which included various mathematical routing concepts (kissing spheres/spherical code, Karnaugh maps). At the time I thought it was boring, but a couple decades later, when working on Clos topologies, it came in handy.

Other interesting notes: the invention of telegraphy and improvements to the underlying electrical systems really helped me understand communications in the 1800s better. And reading/watching Cuckoo's Egg (with the german relay-based telephones) made me appreciate modern digital transistor-based systems.

Even today, when I work on electrical projects in my garage, I am absolutely blown away with how much people could do with limited understanding and technology 100+ years ago compared to what I'm able to cobble together. I know Newton said he saw farther by standing on the shoulders of giants, but some days I feel like I'm standing on a giant, looking backwards and thinking "I am not worthy".

When the Bell System broke up, the old guys wrote a 3-volume technical history of the Bell System.[1] So all that is well documented.

The history of automatic telephony in the Bell System is roughly:

- Step by step switches. 1920s Very reliable in terms of failure, but about 1% misdirected or failed calls. Totally distributed. You could remove any switch, and all it would do is reduce the capacity of the system slightly. Too much hardware per line.

- Panel. 1930s. Scaled better, to large-city central offices. Less hardware per line. Beginnings of common control. Too complex mechanically. Lots of driveshafts, motors, and clutches.

- Crossbar. 1940s. #5 crossbar was a big dumb switch fabric controlled by a distributed set of microservices, all built from relays. Most elegant architecture. All reliable wire relays, no more motors and gears. If you have to design high-reliability systems, is worth knowing how #5 crossbar worked.

- 1ESS - first US electronic switching. 1960s Two mainframe computers (one spare) controlling a big dumb switch fabric. Worked, but clunky.

- 5ESS - good US electronic switching. Two or more minicomputers controlling a big dumb switch fabric. Very good.

The Museum of Communications in Seattle has step by step, panel, and crossbar systems all working and interconnected.

In the entire history of electromechanical switching in the Bell System, no central office was ever fully down for more than 30 minutes for any reason other than a natural disaster, and in one case a fire in the cable plant. That record has not been maintained in the computer era. It is worth understanding why.

[1] https://archive.org/details/bellsystem_HistoryOfEngineeringA...

The more I study the 5E I see it as a multicomputer or distributed system. The minicomputers were responsible for OAM and orchestrating the symphony over time, but the communications are happening across the CM which implements the Time/Space/Time fabric and a sea of microcontrollers. I think this clarification is worthwhile because it drives your point about faults in this computer-era and by extension this (micro)services-era home even more -- it's much less mainframe and more distributed system than commonly chronicled, which can be a harder problem especially with the tooling back then.
It's actually an 8 volume History (I have all 8 on my shelf) 3 were just on switching system - you left out the parallel development to Panel, Rotary.

Museum in seattle also has a working 3ESS (likely the only one left in the world), and have recently added a DMS-10 as well.

> That record has not been maintained in the computer era. It is worth understanding why.

Go on.

Briefly,

The big dumb switch fabric of #5 Crossbar has no processing power at all, but it has persistent state. The units that have processing power all go down to their ground state at the end of each call processing event, and have no state that persists over transactions. The various processing units (markers, junctors, senders, originating registers, etc.) are all at least duplicated, and usually there's a pool of them. Requests "seize" a unit at random from a pool, the unit does its thing, and the unit is quickly released.

Units have self-checking, and if they fail, they drop out of their pool and raise an alarm. The call capacity or connection speed of the exchange is reduced but it keeps working. Everything has short hardware stall timers which will prevent some unit failure from hanging the exchange.

#5 Crossbar has almost no persistent memory. End offices (for connecting subscriber lines) did not log call info. Toll offices did, but that used an output-only paper tape punch. There's so little state in the switch that matching up call start and call end events was done later in a billing office where the paper tape was read.

The combination of statelessness and resource pools prevented total failure. Errors and unit failures happened occasionally but could not take down the whole switch.

There's plenty of info about #5 Crossbar on line, but 1950s telephony jargon is so different from 2020s server jargon that it's not obvious that #5 Crossbar is a microservices architecture.

Thinking about this, this is why Erlang, designed for phone switches, is built around small processes which can fail and be restarted.