Hacker News new | ask | show | jobs
by monocasa 2312 days ago
It's interesting that the actual quote from the investor call is that it's a processor designed in house, and doesn't call out ARM.

IMO, an x86_64 chip makes way more sense. The patents are about to expire. Removing nearly all of the legacy mode only cruft (which is not as much as you might think, but tends to be in the critical data path) and making a chip that runs at least x86_64 user mode code would align with how they removed 32 bit support in Catalina.

5 comments

The patents for x86_64 might be expiring soon, but SSE3/4 and AVX1/2/512 are newer. I'd imagine there is a lot of performance critical code written making use of those extensions, and that's just vector stuff. The x86 architecture has added a lot of other new extensions in the past 20 years as well.
Yeah, but for that, Apple very well might have enough patents in the CPU design space to negotiate a license at this point.
> legacy mode only cruft (which is not as much as you might think, but tends to be in the critical data path)

I'm curious about what you're thinking about here. In fact almost all the code paths for user mode code are running out of the uOp cache in modern devices and completely decoupled from the legacy stuff. And even in the kernel, doing locking and mode switching on the normal paths doesn't hit any major fallbacks. There's a ton of microcode and other legacy handling for odd stuff for sure, but really not on performance loads.

One example: the segmentation hardware needs to evaluated in the TLB lookup path between L1 and L2. Even special casing base=0 length=4G or not (and do the slow path in the not case) and just adding an extra mux there is still a minor burden in the designs I've heard about.

Also, the instruction decode cases for 16bit mode is still in the main instruction decoder and not ucode AFAIK. They're almost the same encoding, and there's not enough ucode pace for it all, but removing those cases from the muxes there would help power consumption. Yes, you run out of the uOp cache a lot of the time, but not as much as you might think, and AFAIK the instruction decoder is still cranking away in the background because you want it to be immediately available as soon as an instruction is not in the uOP cache. That means the power efficiencies can be gained there.

I'm incredibly late with this reply and I completely understand that you may never see it, but I'd be extremely interested to dump myself in the deep end of stuff that's on the same wavelength as what you're describing, if you have any suggestions for resources I might be able to follow up on. Thanks!
How hard would it be to re-use a lot of the ALU, MMU, and other components from the Apple A line of ARM64 chips with a different decoder and pipeline? Pretty much all modern chips "emulate" their instruction set anyway with the real core being a proprietary uop machine.
ALU - easy, it's pretty much orthogonal to ISA layout

MMU - they're pretty different

I def bet that if they're making an x86 chip, it shares a lot of RTL with their A series cores, but the distinction is probably more like they have a shared library of a lot of primitives, and have pretty different uarchs built from them.

Would they do something crazy like x86-64 usermode and aarch64 kernel mode? You might be able to share more of the MMU and so on with that - though given the memory-ordering differences it would still be difficult.
The thought excites me of running embedded x86_64.
They’re also pretty close buddies with AMD right now, who they share a fab (TSMC) with. Could we be seeing something akin to an Apple-flavored Ryzen?
On one hand, everyone who cares about the newest smallest nodes either has their own fab (Samsung, Intel), or uses TSMC. Like AMD and Nvidia share TSMC, and they are far from buddy/buddy.

On the other though, AMD legitimately does have fairly close ties to Apple. Jim Keller has bounced around a lot, but Apple and AMD is where he started new major uarchs. And Hugon Dhyana, and the game consoles show that AMD is more than willing to work with high volume OEMs to have semi custom designs, particularly to empower their security architectures. Yes, Intel includes custom logic for security, but not to the same degree as AMD. I think Intel includes all of their customer's custom logic on most of the masks, but fuses or otherwise hides the functionality; AMD goes hog wild with custom masks.

You've given me a bunch to think about, thanks! I hadn't really considered AMD here.