Hacker News new | ask | show | jobs
by chx 1810 days ago
GE Canada has VAXen running their atomic plants and they are under contract to keep 'em running for a long time, 2035 or 2045. (Might be called BWXT today.)
3 comments

Melbourne (Australia)'s train signals used to be controlled by PDP-11s running Ericsson JZA715 train control software. They were replaced by Ospreys. An Osprey is actually a hardware PDP-11 CPU on an expansion card which plugs into an x86 PC bus. It uses an actual CPU not emulation because realtime applications like train control need to be 100% cycle accurate. (Originally they used actual PDP-11 CPU chips manufactured by DEC, later they switched to using FPGAs). It also has Unibus cards to do Unibus-ISA/EISA/PCI translation so it can integrate with the original peripherals.

http://web.archive.org/web/20210126085900/https://www.equico...

http://www.strobedata.com/home/ospreyguide.html

Why does train control need to be 100% cycle accurate?

PDP-11s ran at what, 1.25Mhz? I'd think that a modern CPU software emulating a PDP-11 CPU could get to below those cycle times.

> Why does train control need to be 100% cycle accurate?

It's not uncommon for IO interfaces to be cycle-sensitive. I can only speculate, but perhaps they do not use a timer and it is cycle-exact code for certain timing operations. They could be bit-banging a serial protocol, as that was done then as now. Or they could be controlling very sensitive things where even a few microseconds of jitter is unacceptable. Tying it so closely to the CPU like that was a dirty but common practice in the 70s and 80s. And sometimes simply obligatory! There is no high precision timer on the low-end PDP-11s by default, I believe.

A few cache misses in a row, and some branches mispredicted and your modern CPU is slower than 1.25mhz. This almost never happens (CPUs are very good at this) but when it does things can get bad.
You could keep all of the PDP-11's RAM and probably most of its "external storage" in the L2 cache of most modern CPUs.
https://en.wikichip.org/wiki/amd/microarchitectures/zen_3

> 512 KiB per core, 8-way set associative

So the 64 core 7763 / 7713 you are looking at 32 megabytes of L2.

The 11/70 in 1975 already had 22 bit address space and like 10MB disks. So -- yes, it's not unlikely with a bit of hand crafting you could keep the whole shebang in L2 but you need a top end CPU.

Lets say the modern CPU gets itself really tied up in knots and is out of action for a staggering 10ms. During that time a speeding train doing 350kph travels not quite a meter. Do trains run such tight scheduling that this isn't sufficient time to cause delay on actuating a switching element and cause an accident?
You have a legacy safety-critical system, which incorporates legacy hardware peripherals. How sensitive is it to changes in timing? You may not actually know. Do you want to do the engineering analysis necessary to prove that replacing one part of that system with potentially different timing is not going to cause problems? Or do you just seek out a replacement whose timing is as close as possible to the original?

The big issue may not be with the trains themselves but the communications protocols used to talk to signalling equipment and other peripherals. Changing the timing in the communication with them may lead to problems.

And what if the original software has race condition bugs which have never been surfaced, and the occasional inaccuracy in timing starts to surface them? Good luck fixing bugs in some obscure piece of PDP-11 software that was written in the 1970s.

You could always setup a train in a box system and iterate all the control logic sequences to verify being within margin of error. Once you know that, equipment substitution is straightforward.
I have no idea. If this is real time control that could mean you keep running the motors in the switch long enough to damage something. Or maybe you go past the end of travel switch signal without reading it, the switch turns off and you never stop... There are a lot of ways real time systems can fail.

You are correct that 10ms is well within the margin of error for safety stopping a train, but it may be out of the margin for some subsystem in the control.

With bipolar memory option, the pdp-11/45 had a 300 ns memory cycle time.
I would love to have some one write an article about how they even manages to do that.

Are they real VAX servers or have they've been virtualized/emulated somewhere down the line.

More than you would imagine.