Hacker News new | ask | show | jobs
by 8fingerlouie 1000 days ago
A long time ago i used to work at a company that made mobile phones. This was in the days before smartphones, and 3G was a shiny new toy. Back then, many companies would implement the protocols in the phone firmware, as opposed to today where chips usually offload a large part of this.

Despite doing our best to implement the GSM Standards[1], the thing at the time was 19,000 pages long, and one of our biggest issues was that our protocol stack would work well with most cell towers, there would always be some carriers that had configured their network just a tiny bit different, and we had to tweak our software to match that.

At the time, there were 5 major vendors of GSM cells, and none of them had intepreted the standards identically, so we would also have tweaks to allow for different vendors.

I don't remember exactly how many people we employed in our protocol division, but it was more than 100.

I do remember that implementing the first version of bluetooth took 19 people almost 2 years, and that, at the time, was a much simpler protocol (it probably still is). I can only imagine the "horrors" that have gone into 4G and 5G since then.

These days, all of the above problems are "solved" by simply installing a chip. Qualcomm handles any vendor hardware communication issues, and they're probably "big enough" to also have an impact in the opposite direction, making for a more streamlined protocol landscape.

[1]: https://www.etsi.org/standards/get-standards#page=1&search=&...

2 comments

This. The standard is also truly horrible to read. It's not really designed to be read, as much as it is an after-the-fact description of what somebody has already done.

To the point that testing against a very wide variety of base-stations (and the multitudinous variety of configuration options in those base-stations) is mandatory.

Which is presumably hard to impossible without flying around the world with test chips and physically being in those places, as I guess only a handful of carriers can and will reveal all their config settings and software stacks in practice.

The more I think about this modem development problem the harder it seems to get. No wonder Apple have struggled and nobody else even seems to try. The amount of tribal knowledge and random hotfixes in the Qualcomm firmware alone must be irreplaceable.

> Which is presumably hard to impossible without flying around the world with test chips and physically being in those places

We had people driving around with our phones logging to a laptop. They basically drove all over the country, trying to cover as much land as possible, and the phone(s) would then attempt to connect to different base stations.

The logs from this would be sent back to the developers to investigate failures, and somewhat often a bug would only manifest itself on a single cell tower.

Qualcomm has presuably done this work for most of the world, and has subsequently become the benchmark that telcoms calibrate against before deploying cell towers, leading to a more uniform protocol landscape.

If Apple hopes to create a new in-house modem chip, they will either need to calibrate it against Qualcomm, or do the gruntwork of travelling the world. In either case, i'm betting that just using Qualcomm chips regardless of the price will be better from an economic perspective.

Furthermore, everything GSM is covered by patents of the "big 5" (Motorola, Nokia, Siemens, Ericsson, I forgot the 5th), which at least at the time had free use of the others patents regarding GSM, but *everybody* else implementing GSM hardware and/or software must pay license fees. Qualcomm and Nokia had a big fight over this 3-4 years ago.

>>We had people driving around with our phones logging to a laptop. They basically drove all over the country, trying to cover as much land as possible, and the phone(s) would then attempt to connect to different base stations.

Mobile carriers still do this type whether via contractors or their own staff. Unfortunately, there's nothing that beat boots-on-the-ground field testing with actual devices (plural), which is already notoriously unreliable and prone to noise.

It's especially tough in countries with a large landmass (e.g. Canada, USA, Australia, Russia, China, Brazil, etc.).

I had assumed it was a lot less these days.

One good thing that came out of it was the low power “GPS” on early smart phones, the ones that would triangulate the phones position from what cell towers it could “see”. That would not have been possible without someone driving around “everywhere” and recording GPS location alongside cell signal strength.

As far as i know it is less frequently used today than it was a decade ago, but i could be wrong. GPS in smartphones have moved from technology[1] to something you just use without thinking about how it works.

[1] https://www.azquotes.com/quote/343497

> We had people driving around with our phones logging to a laptop. They basically drove all over the country, trying to cover as much land as possible, and the phone(s) would then attempt to connect to different base stations.

Sounds like Apple needs to put test phones inside the cars doing “street view” as well as have Apple Store employees test them.

> and somewhat often a bug would only manifest itself on a single cell tower.

I can easily understand different brands/products/generations having their quirks, but I'm struggling to imagine what the source of uniqueness could be for a single cell tower.

Is it a huge variety of config options, that certain combinations of settings turn out to be rare? Or is it literally just something broken like malfunctioning hardware?

> but I’m struggling to imagine what the source of uniqueness could be for a single cell tower.

GSM is/was a complex beast. Each base station can only handle 8 simultaneous phone calls (the old 2G/3G multiplexed ones, not modern VoIP), so in crowded areas they’re usually configured with a very short range. Some large conferences have had base stations with their range measured in single digit meters (<30 ft).

Furthermore, like WiFi, bandwidth is limited, so base stations are deployed in a “beehive like pattern”, like a triangle with a base station radiating out from each leg, and broadcasting at different frequencies to it’s neighbors.

That alone leaves a lot of room for configuration errors on each individual base stations, but when i say “a single cell tower”, i meant on that drive. The bug might be with a specific firmware version of that base station manufacturer, or that particular hardware revision, or simply a configuration error, or maybe it was a bug in our software and/or radio firmware. There are a lot of “moving parts” that needs to be investigated, but from a developer perspective, the error only occurred on one base station.

It could of course also turn out to be a “broken” base station, and often enough we would fail to find the error, and had to contact the network operator to get them to help trace down the error.

> Each base station can only handle 8 simultaneous phone calls

Perhaps you are confusing that each GSM transceiver (TRX) provides 8 time-division channels with call capacity, but most cells and specifically the BTS in GSM parlance, especially any in a well populated area have/had way more than one transceiver. 30-40 was not unheard of in later equipment, though 10 or so was more typical. Late in GSM's life there was another technique to squeeze more channels, OSC.

Furthermore those 8 TDMA slots could be split in 1/2 or 1/4 with lower rate codecs, so it was more than 8 per TRX as well.

anything bigger than a picocell would carry way more than 8 simultaneous calls.

> Some large conferences have had base stations with their range measured in single digit meters (<30 ft)

Femtocells are still a thing today. Not so much to do with frequency capacity.

Cellular base stations have lots of configurables. There's a lot of timing related settings in cellular radios. So you've got a thousand places to introduce a 500 Mile Email problem[0]. Some timing settings can run up against timing consts in the client firmware.

For instance towers around an airport might have some bands running with reduced power or disabled. This reduces cell size/overlap on those towers requiring more frequent handoffs between those towers. A client firmware might have a bug or unrealistic const set that fails in that frequent handoff situation. So it's a bug that only happens on some 5G bands near an airport but only during the summer because the more arid conditions increase microwave propagation by 1dB.

[0] https://web.mit.edu/jemorris/humor/500-miles

Well Huawei seems to have succeeded
Huawei invented most of 5G technology and could tailor the 5G standards to their implementations. Which is also why they were first to market with carrier equipment [1] and had the first CPU with an on-chip 5G modem [2].

[1] https://www.wired.com/story/huawei-5g-polar-codes-data-break...

[2] https://en.wikichip.org/wiki/Kirin_990

No they didn’t. This is utter garbage. I have worked in cellular testing since 1997. The first to market with pre-5G were NTT DoCoMo and Verizon. Huawuei were late to the game. My source is the companies we sold to and I supported personally.
If I ran a company where someone tried to release a 19K page spec I'd keep firing people until someone could produce a spec that I could actually review and sign off on, that could also still be reproduced in working technology. How does this kind of thing fly at all?
The thing is, GSM is not a single specification. It is layered like the OSI model, and each layer has multiple specifications for different subsystems.

When GSM was originally specified, there was serious doubt if it would work at all on the hardware available at the time.

A GSM call has a 3.62us send window, a 3x3.62us waiting period, and a 3.62us receive slot. That’s probably hard enough to achieve on a single cell, but factor in distance to the tower as well, and the handset has to do a lot of calculations.

Once you start considering handoff between cells things get even more interesting. The handset continuously reports a list of cell towers it can “see” along with the strength of their signals from the handsets position.

Once the cell tower decides that the handset is moving out of range, it propagates a handoff message to its upstream node, which repeats this process until a suitable downstream node is found, and each cell tower is then alerted that the handset with an ongoing call is switching from/to that cell, and finally the handset is informed. The handset has no say in this process other than the list of cell towers.

All of this took place in the late 1980s, it was finalized in 1987, and modern processors at the time would be the Intel 386DX running at 12 to 40 MHz. Obviously not handsets came with a 386 processor, and when I worked on mobile phones in the early 2000s, the norm would be something like an 8 to 12 MHz 16bit platform.

Since then the specification has been revised hundreds of times. In the original specification, messaging (SMS) was an afterthought, and a way of utilizing otherwise unused bandwidth, which was normally reserved for command and control. Turns out messaging was a hit with GenX, so many changes were made to that subsequently, like EMS[1], MMS[2] and RCS[3].

Likewise focus shifted from phone calls to data, which was 4G, and to IoT and always on devices, which is what 5G is about. 4G and 5G also has increased the number of active devices possible.

Add to that WiFi calling and all the other little enhancements, and I wouldn’t be the least bit surprised if the specification is closer to 50,000 pages today.

[1] https://en.wikipedia.org/wiki/Enhanced_Messaging_Service [2] https://en.wikipedia.org/wiki/Multimedia_Messaging_Service [3] https://en.wikipedia.org/wiki/Rich_Communication_Services

Thanks for taking time to write such detailed replies. I realize my reply was off the cuff; it's just astonishing that that much information was able to be absorbed and operationalized. Hats off to folks that were able to make it happen at all!
Incredible reply, thank you. People like you are why I come here every day.