Hacker News new | ask | show | jobs
by stinkbeetle 17 days ago
> There were highs of elegance, yeah. The OpenBoot PROMs introduced with the SPARCstation were marvelously functional and beautifully elegant, especially compared to the previous pre-boot environment. But when you look under the cover, you find a million patches of duck tape, like Sun having to force their compilers to avoid using the o7 register due to speculative instruction prefetching sometimes triggering DMA activity on a peripheral card and causing an unintended side effect. This was due to one buggy CPU (the 80 MHz Weitek upgrade CPU for the SS2), but the bug required changes for all sun4c kernels (at an minimum).

Do not look at ACPI, boot firmware, or the CPU microcode, instruction match "patch" modes, chicken bits, or any of the other horrible hacks required for modern CPUs to run :)

CPUs have more or less always operated under the same constraint as any other engineering project, which is to optimize the cost/value of the thing. That means at some point you bake the silicon that is guaranteed to have known and unknown bugs in it. CPUs sit in a different place in this spectrum than software does, thanks to the relative ease of software patching, but underneath it's bugs and hacks. So they do certainly get far stronger testing and verification treatment before shipping. But there is enormous infrastructure baked into the silicon purely for finding and fixing bugs that inevitably escape that QA. Everything from leaving a sprinkling of spare gates and latches around the chip so you can use them for post-synthesis or metal-layer fixes, fallbacks and and fixups everywhere. There are watchdogs or hang timers or state condition checks in the core and SMP fabric so if some known or unknown condition causes deadlocks or livelocks, you can hit it with a hammer and go to some slow mode (e.g., single-issue, non-speculative, in-order) for a while to clear it up.

CPUs in embedded or certain vertically integrated shops did have the issue that fixing bugs in the compiler or their applications was viable so you would get a bunch of craziness leaking out (there are or were patches in binutils to pad code so it doesn't put branch instructions at the end of a page, things like that, for more than one CPU). ARM and x86 CPUs today would absolutely ship with bugs like this if backward compatibility were not extremely important and if the hardware vendors had more control over the software stack.

There were a bunch of serious user-visible speculative execution bugs in ~all modern high performance CPUs within the last decade (yes, AMD, ARM Ltd, and I believe Apple all had speculative execution security side channels too). Occasional issues with user and supervisor level can be seen in errata documents too, often they can be fixed with "firmware" (which means microcode, chicken bits, etc), but they still exist.

1 comments

True! That was my point exactly, that (for the most part) the old workstations weren't special or magical relative to PC hardware, when you pull old DEC and Sun hardware manuals on Bitsavers or whatever they're chalk full of ink from manual corrections and errata. Old Ethernet NICs are especially bad... :D

This isn't to disparage them, either. GP admits they are romanticizing, I'm just offering my own perspective on it. When I call old stuff "hacked together piles of garbage", it was meant with the loving connotation of someone who's home office has a MicroVAX 3400, Sun 4/75, DEC PWS 433a, and a POWER9 workstation piled in the corner, all on a KVM switch. I love tinkering on these old machines, but I think it's healthy to remember they're not beacons of 80s/90s perfection, but products that were made and sold under time/cost constraints, as you said.

... Though, I will say, the MicroVAX was running from the late 80s until about 2018 in a university environment, and its HDDs still report no errors. That is pretty remarkable ;)

To add a bit of context: I'm not even romanticizing the actual implementations, which may or may not have had horrible bugs and errata. Rather, it's the abstract concept, the ability to have a reasonable expectation that, for example, the firmware would be completely operable, scriptable and so on and so forth from a serial line. Stuff like this.