Hacker News new | ask | show | jobs
by trynumber9 614 days ago
Apple is the only one making ARM chips fast enough to be competitive even with emulation.

Qualcomm isn't on that level - they're only on par with AMD and Intel without emulation.

The market won't move from the x86 duopoly to Apple's walled garden because they have a fast chip. It's on ARM to make a licensable core that's so much faster than the x86 options that people actually move to it.

4 comments

> Apple is the only one making ARM chips fast enough to be competitive even with emulation.

I'd modify that to: Apple is the only one making ARM chips fast enough to be competitive, period. All of the other cores from ARM or Qualcomm aren't as fast as the top Intel and AMD x86 CPUs, just maybe more efficient. It is the reason Windows has continued to fail on ARM, because they have to use the same slow off-the-shelf cores as everyone else.

I'm a big Surface fan, and wish they had a version with an Apple M-class SoC in it, but every ARM version (which used the fastest non-Apple ARM core at the time) has been a dog compared to the same model with Intel. Just give me the iPad Air SoC in a Surface...

> All of the other cores from ARM or Qualcomm aren't as fast as the top Intel and AMD x86

I don’t think top cpu perf is relevant. I was working on some C code for science stuff inside Termux on a Pixel 7a, and would’ve been perfectly ok having that perf on a standard format laptop. I even noticed some branch prediction was better than x86. It’s more an issue that no one is making a decent arm in laptop format with nvme, enough ram etc.

> I was working on some C code for science stuff inside Termux on a Pixel 7a

I don't want this to come across as rude or condescending, but who hurt you?

Ha, I was benchmarking arm really, not writing from scratch.
> just maybe more efficient

Isn’t efficiency all that matters nowadays? It’s exactly what gives Apple the crazy battery life and allows them a lot of thermal headroom to drive the chips with high power.

It’s not clear to what extent this is necessitated by the ISA as opposed to caused by the implementation. Recent x86 chips have caught up a lot in terms of efficiency.
Apple's emulation literally implements some x86 in hardware, and they will likely drop that silicon when the transition end, so I won't rely on that
It doesn't implement some x86 in hardware, it implements some memory ordering guarantees in hardware to match what x86 requires.

However that is a very minor implementation, there is no actual x86 in Apple's ARM.

https://www.sciencedirect.com/science/article/pii/S138376212...

TSO could be implemented by other ARM processors easily as well, to provide the same memory ordering guarantees. Besides that the x86 code is translated by Rosetta to ARM instructions.

Besides TSO there are a bunch of ARM extensions designed to match x86 behavior. https://dougallj.wordpress.com/2022/11/09/why-is-rosetta-2-f... I'm curious whether Oryon and X925 implement these instructions and whether other emulators like FEX use them.
He didn't say it implemented any instructions, I would argue that the memory ordering is quite important and part of why other CPU architectures can sometimes be faster or more efficient than x86, it takes effort to guarantee atomicity and apparent read and write ordering. And when a CPU was expending effort that means either transistors or time.
Apple also implements a couple x86 CPU status register flags not present in AArch64.
> Apple is the only one making ARM chips fast enough to be competitive even with emulation.

My big problem is Rosetta 2 doesn’t emulate AVX which more and more software uses.

I work in an AI platform team. I’m not actually trying to do machine learning stuff under it, but I just want to start the Docker containers to test some unrelated functions on my laptop. And that happens to start Tensorflow and pgvector, even though I’m not really using either in anger in this case. And both try to use AVX, and then get a SIGILL, so those containers fail to start.

Maybe should just build Linux ARM Docker containers but trying to get some ARM CI machines to build them with (we could just use our laptops but want to do it properly)

> My big problem is Rosetta 2 doesn’t emulate AVX which more and more software uses.

You can probably blame that on patents. Base x86-64, which includes SSE2, is old enough that all relevant patents have already expired (the x86-64 ISA documentation was first published by AMD 24 years ago, see https://web.archive.org/web/20000829042324/http://www.x86-64...). Other ISA extensions are newer, and might still be threatened by patents.

Maybe patents are involved, but there's a bigger issue too: the Apple chips don't have support for the equivalent Arm instructions (SVE/SVE2) nor wide enough vectors in their SIMD units. Any AVX/AVX2 emulation is going to be dog slow, even if it isn't encumbered by patents.
For my use case, I don’t really care much about AVX performance, since I am using it very minimally.

Using QEMU instead of Rosetta 2 gets past this, since QEMU doesn’t seem to be afraid of those patents, but it makes everything else a lot slower

Maybe, if Apple made available a plug-in API for Rosetta 2, to enable plugins to emulate additional instructions. Then some open-source plug-in could implement the missing AVX instructions, but if Intel tried to claim Apple was infringing on the AVX patent, Apple could (truthfully) say “we have nothing to do with that plug-in, we just created the API it calls”

Another approach would be if Apple open-sourced Rosetta 2, and then a community fork could implement this stuff. I doubt Apple will do that though - I think they view Rosetta 2’s superior x86 emulation as a commercial advantage over other ARM laptop vendors (such as Qualcomm’s ARM Windows systems), and they’d likely view open sourcing it as giving away that commercial advantage

I think it’s more on NVidia, Qualcomm, or AMD to engineer their own ARM based chips which can outcompete x86 variants. Both NVidia and AMD are rumored to be working on general purpose ARM based CPUs. Right now NVidia is likely the one to do it thanks to their absolutely obscene margins giving them more than enough money for R&D.

I personally don’t think there is a lot of incentive for ARM to make the fastest possible cores. They’d be undermining those who are currently paying the most to license their IP. ARM’s real incentive is power efficiency and then letting the licensees use and abuse that for performance gains.