Hacker News new | ask | show | jobs
by hulitu 1871 days ago
This is really sad. The world is heading to a duopoly x86 - arm. Alpha is dead, Mips is almost dead, PA-RISC is dead, POWER is too expensive and RISC-V is mostly nice to have.
8 comments

A lot of these architecture have some drawbacks in modern times.

Alpha’s loosey-goosey memory model makes multithreaded code on SMP systems more challenging. Linux utilizes its Alpha port as a worst-case testbed for data race conditions in its kernel.

SPARC’s register windows are anachronistic and complicate the implementation of CPUs, and I’d guess also make it more difficult to build OoOE cores (so many SPARC chips are in-order, why?)

POWER isn’t so bad though. It’s open enough where you could build your own lower-cost core if you wanted. There’s nothing intrinsic to the ISA that would mandate an expensive chip other than volume constraints.

PA-RISC put up some great numbers back in the day but between the Compaq acquisition (bringing with it Alpha) and Itanium it was chronically under-resourced. They had a great core in the early 90s and basically just incrementally tweaked it until its death.

You could even build your own Power ISA system with Microwatt, which is fully synthesizeable and growing by leaps and bounds.

https://github.com/antonblanchard/microwatt

(Disclaimer: minor contributor)

I really liked PA-RISC. I thought it was a clean ISA with good performance at the time and avoided many of the pitfalls of other implementations. I think HP didn't want to pour lots of money into it to keep it competitive, though, and was happy to bail out for Itanium when it was viable. My big C8000 is a power hungry titan, makes the Quad G5 seem thrifty.

IDK, I never really liked PA-RISC, but to be fair I was always able to look at it from a hindsight perspective. Looking back it seems to have most of the RISC issues that complicate modern ISA design. Like branch delay slots, having a multiply instruction wasn't RISCy enough for it to bother with, etc.
...and MIPS has the weird branch delay slots as well as pretty horrible code density.

If you look at ARM, particularly the 64-bit version, you'll notice it attempts to squeeze multiple operations into a single 32-bit "instruction". It's still called RISC, but not really "reduced" anymore.

Nowadays RISC seems to mean "load-store architecture" but I think the term should be left in the 90s. CS curriculum is slow to evolve.
Not sure anyone sees "pure" RISC as being an advantage these days though. Didn't Intel demonstrate that you could get RISC-like performance from a CISC ISA even with all the drawbacks of x86 (instruction decoding complexity etc).
> Alpha’s loosey-goosey memory model makes multithreaded code on SMP systems more challenging.

I thought Alpha and ARM were the same with respect to that.

ARM had some fairly nasty to track down XFS file system corruption bugs for quite a while for exactly this reason.

The issue has always been that x86 goes out of its way to generally be more forgiving than the spec.

> Linux utilizes its Alpha port as a worst-case testbed for data race conditions in its kernel.

Is that still true in the present tense? Anybody doing this in 2021? Seems like alpha has been dead for a long time.

Not Linux but the Linux formal memory model. The idea is that the compiler optimizations can be as nasty as the Alpha out of order execution engine and cache. The Linux code has to cater for these optimizations even though it will not result in an actual assembly instruction on anything except the Alpha. Problem is, on Alpha there's indeed an actual price to pay in performance for that nastiness.
I suspect you misunderstand the question. My question is if anybody is presently using alpha hardware to verify such correctness. I understand memory models and barriers etc. and that alpha is one of the most relaxed on this front, that it historically influenced the kernel code and was previously very important test hardware. But the hardware is now very dated, to the point where it might not be good test hardware.
The answer to that question is no, but the Alpha is still considered the least common denominator even though the hardware is obsolete. When people write litmus tests for the Linux memory model they are still validated against Alpha semantics, because compiler optimizations have the same reordering effects as the weird caches of Alpha processors.

(The stroke of genius of the C++11 memory model, compared to the older Java memory model, was that reordering could be treated the same way no matter if performed by processors or compilers).

I know that GCC for example is tested on a bunch of wacky old - read: obsolete ;) - hardware, so it's certainly possible that the same is true for Linux.
On the plus side, at least it looks like it will be a duopoly. For a long time it looked like x86 would eat all other architectures.
It's mildly interesting to me that there's now really no notable big-endian systems left, yet that's still the network byte order. I wonder what the math is for the amount of global wasted CPU cycles on byte-swapping for things that would do a fair amount of that...DNS for example.
> It's mildly interesting to me that there's now really no notable big-endian systems left

That's not correct. s390x is big-endian and well supported in all enterprise distributions such as SLE, RHEL as well as Debian and Ubuntu.

Though as we recently learned, it's considered sufficiently "fringe" by a big chunk of the development community that it's not that big a deal to drop support for it. (Not to imply IBM couldn't be sponsoring development for it more).
If you're talking about the python cryptography fiasco that was dropping support for S390 (31-bit architecture discontinued in 1999). S390X (64-bit architecture introduced in 2000) is supported by Rust, though not necessarily by Python Cryptography.

Incidentally Rust's continued support for S390X is driven primarily by cuviper who works for Red Hat (even before the IBM acquisition).

But s390x support isn't dropped anywhere. On the contrary, IBM spends a lot of money and efforts to make sure it is well supported by free software.
Notable in terms of global cpu capacity. Linux on zSeries is interesting, but only makes financial sense in some pretty limited scenarios.
Power ISA systems are still bi-endian and many systems run big. In fact, the low level OPAL interfaces require you to be in big-endian mode, AIX and i are still BE, and both FreeBSD and OpenBSD have BE flavours for current PowerNV systems. Even a few Linux distros run big (Adelie comes to mind). They're definitely a minority but they're still around.
Power ISA includes an endianness switch in the spec. Power and Power64 are all BE and LE. Most Linux distros only support modern versions on LE though. Debian has a BE port but it's not considered a primary release target.

The last PPC64 release of Ubuntu was 16.04 which is now out of support by about a month. Even on that, the two major web browsers didn't support building on the platform for a long time.

Yes, it can be done if you want enough to do it.

https://catfox.life/2018/11/03/clearing-confusion-regarding-... for more info

Yes, I wasn't claiming that no big endian systems exist. Just that they are overwhelmingly in the minority now, and so the number of ASM byte swapping ops happening is mildy amusing.
Many CPU's have load/store instructions that perform the network byte order swap with no/minimal overhead.

Serialization formats like JSON/YAML/protobuf/etc. would be much more costly by comparison.

IIRC ARM devices can also be big-endian and GCC can even generate big endian 64-bit ARM code:

https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html

Yeah, you can find an ARM big endian distribution of, for example NetBSD. No Linux that I can find. Apparently boot issues are a bit tricky.
I have found a Gentoo distribution with big endian for the raspberry pi 3, so it is out there https://github.com/zeldin/linux-1/releases
I'm fairly sure that NetBSD/arm switches to big endian once the kernel is running, the boot process is unchanged.
The thing holding it back for Rpi4 is UEFI+ACPI, so I assume there's some boot process changes.
ACPI is problematic in big endian.
You used to be able to get debian arm-be, but that was a good 15 years ago.
IBM mainframes are big endian, all the Linux distros support them too.
IBM mainframes are big endian essentially because punch cards are big endian.

(Punch cards are big endian because the number 123 is punched as "123". So that's the order a decimal number will be stored in memory. The System/360 mainframes (1964) had a lot of support for decimal numbers and it would be kind of bizarre to store decimal numbers big-endian and binary numbers little-endian so everything was big endian. IBM's current mainframes are compatible with S/360.

On the other hand, in a serial computer, you operate on one bit at a time, so you need to start with the smallest bit for arithmetic. The Intel 8008 was a copy of a serial TTL computer, the Datapoint 2200, so it was little-endian. x86 is based on the 8008, so it kept the little-endian architecture.)

I know of at least one micro arch that'll fuse load and byte swap instructions into a reverse endian load. There's still probably a detectable overhead, but it's not the end of the world to hack on later.
There are various open-source and white-box network switches and routers - do any of them run big-endian? If not, it must be a solved problem (perhaps by fast-path dedicated ASICs).
> If not, it must be a solved problem (perhaps by fast-path dedicated ASICs).

Correct. The data plane of all 'real' networking is done in ASICs and/or NPUs.

Surprisingly less true these days.

Increasingly things seem to be moving towards ASICs for switching and general purpose CPUs (usually with a lot of support from the NIC offload capabilities) for routing, even in 'real' networking hardware.

The vast majority of fabric ASICs would never actually utilize additional TCAM necessary to support full tables at line rate in hardware because top of rack switches do not have that many addressable targets, so it's a wasted cost.

And with DPDK optimized software implementations are achieving zero drop line rate for even 100G+ interfaces for much, much lower cost than full table routing ASICs married to fabric ASICs in a chassis switch.

It's not something a lot of users are aware of -- they often think they've bought an ASIC-based router! -- but essentially all of the big vendors entry and mid-level devices are software routers, and they're even trying to figure out how to sell their NOS experience on whitebox hardware without undercutting their branded hardware.

> It's not something a lot of users are aware of -- they often think they've bought an ASIC-based router! -- but essentially all of the big vendors entry and mid-level devices are software routers, and they're even trying to figure out how to sell their NOS experience on whitebox hardware without undercutting their branded hardware.

To be fair [to you], my original claim is a bit of a tautology as I don't really consider software/CPU based CPE gear to be 'real' networking.

I should be more specific. High radix switches/routers are, unequivocally, not built out of CPUs and software, period. To the point of the original discussion, these concentration points are the only place that byte order overhead would be significant. Others in this thread claim it's not significant even in CPU implementations due to optimized instructions, but I personally can't opine on that.

Last NPU I worked with (admittedly 10+ years ago) was little endian! It used load/store-swapped instructions. (Why? I can only guess that they licensed a little-endian CPU core for other reasons.)
Raptor Computing has some expensive-but-not-that-expensive POWER systems:

https://raptorcs.com/

x86: 8086 1978 x64: 1999

ARM: 1985 ARM64: 2011

RISC V: 2010

It took x86 about 10 years (1988) to become the most popular, and until 2005 to cause Apple to switch (another 17 years)

It took ARM about 25 years (2010) to become the most popular, and until 2020 to cause Apple to switch (another 10 years)

Apple has been using ARM on and off since 1993. They have more long term organizational experience with ARM than they did with x86.

The Newton, then the iPod, then the iPhone, and now the M1.

The iPhone is a more important device for Apple than the Mac from a revenue point of view, and they've sold more devices with ARM chips in them than they have 68k, PowerPC, or x86. They've sold 2.2 billion iPhones. I can't find an easy number on how many Macintoshes they've sold totally, but I can't imagine it's close to that.

In fact, they used ARM in the Newton (1993) before they used PowerPC in the Power Mac (1994).

“Switch” to what? Apple is one of the founders of ARM and still holds ARM shares IIRC
They sold their 40% stake in ARM when they were short of cash.

Switch from Power PC to Intel, and then from Intel to ARM. I'm using Apple as a tipping point, to when the new architecture was so much better than the old it completely took over. Obviously with 90% of Apple devices being ARM already it was an easier choice for them this time. But as each Architecture gets more power as the market is many times bigger, it may be more difficult for the new entrant.

That's why RISC V's win (if it occurs) will be because it's Open Source. Linux won in 30 years against everyone else due to that.

On the Apple specific case I think any move to RISC V would be because it would want more control than it has with Arm. It could then take the RISC V ISA in the direction it wants.

I'm guessing it already has a lot of influence over Arm though and there are other factors that strongly act in favour of staying with Arm.

If Nvidia takes over Arm though and starts making life difficult for the ecosystem then that could change ....

Apple is really interesting, with chip design being moved inhouse and the ease of which they seem to switch architecture they could move away from ARM if the Nvidia purchase happens. I think they’d want to avoid it, at least for the next 10 years.

It would be interesting to know how important the ARM instruction set is to Apple.

I wouldn't be completely surprised if there is a box running a build of Mac OS for RISC V somewhere in Cupertino!

Seriously though, I suspect that the ISA isn't that important for Apple but on the other hand I think they're probably quite happy with the direction of the Arm ISA (probably had a big say in parts of it) and it would take quite a lot to push them away.

I think that the odds on the Nvidia takeover are quite small by now so don't think a move likely at all.

Given Apple's history, and their business style, I don't think they have loyalty to any architecture or any specific technology in particular. They're care about product first, and choose whatever technology they need to choose to get there. https://youtu.be/oeqPrUmVz-o?t=113
> they could move away from ARM if the Nvidia purchase happens.

The Nvidia purchase is irrelevant to Apple. They have a license that won’t be impacted.

The only thing that would make them move away would be a performance bottleneck in the architecture that necessitates a shift.

Pretty sure Apple has a permanent ARM license. They'll watch what happens with Nvidia, but it doesn't really affect them because, as you say, all the secret sauce is in-house.
Apple switched from Motorola 68k to PowerPC, too, and Sun switched from 68k to SPARC. The Amiga, NeXT, early Palm devices, and the ST were also using members of the 68k family. That's an ISA born in 1979 and largely replacing (and inspired by) the 6800 (1974) which had a 16-bit address bus and 8-bit memory bus and its (binary incompatible but with the same assembly language) little brother the 6809 (1978). The Tandy Color Computer and the Dragon were notable 6809 systems.

That, of course, is just with the Mac since Apple previously used variants of the MOS 6502 (1975 and allegedly an illicit clone of the MC6800). Apple, Atari, Acorn, Commodore (the owner of MOS for several years), BBC, Oric, and Nintendo used it in multiple systems each. Apple, Acorn, and Nintendo built additional systems on its updated sibling the WDC65816 series (1983).

The the 6800/6809/Hitachi 6300/68k/Dragonball/Coldfire dynasty and the bastard MOS6502/WDC65816 families were collectively basically the ARM of their day in a way. Everyone targeting low priced or power-sipping was building platforms around them at one time or another. Acorn went from a customer to a major competitor and successor.

It should be noted that the PowerPC and the whole POWER ISA multi-platform family was largely inspired by Apple in the first place. They were talking to IBM about a new platform and invited Motorola to the talks as their long-time processor provider. They formed the "AIM Alliance" that eventually morphed into the POWER Foundation and OpenPOWER initiatives. I can't really speak to how much of POWER ISA is inspired by Motorola's own "RISC" processor, the 88000 series.

It was only a few years ago that the world was heading to an x86 monopoly.
Yes, but where can I buy a SPARC CPU? How many of those who have/can have it are running Illumos and are putting money/time in it? And more importantly what's the outlook for SPARC?
> Yes, but where can I buy a SPARC CPU?

You can buy them used or new in various kind of servers.

> How many of those who have/can have it are running Illumos and are putting money/time in it?

Dunno, I'm not really a Solaris guy. I use Solaris as a hypervisor for Linux and BSD LDOMs.

> And more importantly what's the outlook for SPARC?

Well, you could make the very same argument about Illumos. The Python developers wanted to drop support for Solaris already and OpenJDK upstream did actually drop it.

For illumos, the sweet spot is the 10-15 year old Sun gear you can pick up on eBay. Works well, supported, not overly expensive.

Newer SPARC systems are really quite good. And pretty cost-effective too. The problem is that the starting price is out of reach, and almost nobody is offering a cloud service based on SPARC, so you can't hire it either.

I'm running illumos on SPARC. I have some old hardware (desktop and server) that I like to make use of. Time, yes, but I'm not putting money into it.

And while OpenJDK upstream has dropped support for SPARC and Solaris, that was really all about problems with the Studio compiler. I' maintaining an illumos OpenJDK port with the gcc toolchain on x86 - it's not excessively hard, and realistically if you're using a common toolchain and common CPU most standards-compliant code is portable at the OS layer.

I have SPARC systems, I run NetBSD on them though not Illumos.
Intel (née Movidius) were selling SPARCs a year or two ago in the Myriad 2. SPARC is an open source CPU with solid GCC support.
Keep an eye on Tenstorrent AI cpus...