Hacker News new | ask | show | jobs
by daguava 1477 days ago
There's a lot of claims of poached talent in the article, basically claiming [paraphrasing] "Apple, maintaining their stressful work env and not paying to shore that up lost some rockstars"

How true is this? If they're on the money it's an excellent example of a talent retention miss leading to a demonstrable mediocrity in delivery.

7 comments

Seems silly to me. CPU designs are important, but these companies have more than enough engineers to make competent designs even with some people leaving. There's another factor that completely dominates. It's all about the fabs. Intel lost the performance lead, was it because of their designs? No, it's because they lost the lead in fabs. AMD passed Intel, was it because of their designs? No, it's because they use TSMC's fabs and TSMC passed Intel. Apple blew everyone away with M1, was it because of their designs? No, it's because they paid TSMC boatloads of money for exclusivity on their latest fabs. Apple M2 disappoints on CPU performance, is it because of their designs? No, it's because TSMC's next fab isn't ready yet so they're still using the same fabs as M1.

These days I care more about which TSMC process node my chips came from than which company designed them. I need a new computer but I'm waiting until next year because there will be a wave of new CPUs and GPUs coming out with much better performance. Better designs? Maybe a little, but it's really because they're all moving to TSMC N4.

I really hope Pat Gelsinger can save Intel's fab business because we really need another company that can compete in fabs and Samsung isn't doing too hot either.

> No, it's because they lost the lead in fabs. AMD passed Intel, was it because of their designs? No, it's because they use TSMC's fabs and TSMC passed Intel. Apple blew everyone away with M1, was it because of their designs? No, it's because they paid TSMC boatloads of money for exclusivity on their latest fabs.

The fixation on the fab process is bewildering. Yes, it does help, but it is also an optimisation step that is decoupled from and that bears no relevance on the chip design. Yes, the smaller node size also brings the increased density along and an increased number of things that can be whacked into the same sized piece of silicon, but it will not magically improve the overall system performance or result in the linear architecture scalability.

The article is specifically calling out a potentially decreased ROB size in M2 cores, and ARMv9 also potentially not arriving until M3 which are crucial to the speed or software performance. There is absolutely nothing the fab process can do to make SVE2 and matrix instructions automagically appear in lithographic chip designs – those are the «silicon» design time decisions. As we have recently been seeing more and more practical, mainstream use cases of the advanced use of the SIMD instructions at the C/C++/Rust runtime level that bring an order of magnitude level performance gains, having the SVE2 implementation at the ISA level is becoming somewhat critical.

Stuff like adding SVE2 can be great for specific applications but it's really marginal when looking at whole system performance. What's not marginal are the improvements in power efficiency and room for more cache that come with new process nodes. These chips are power constrained in almost everything they do, because of heat dissipation or battery life or both. Less power and more cache benefits everything automatically, not just the very few things that actually start using new SIMD instructions or other new hardware blocks each year.
> Stuff like SVE2 is really marginal when looking at whole system performance.

It is not. A recent paper (https://arxiv.org/pdf/2205.05982.pdf) from Google engineering has compared performance of a vectorised (SIMD) vs non-vectorised implementation of the quick sort in the Highway library as well as the performance difference of the AVX-512 vs NEON/SVE1 implementations. By switching to the SIMD processing alone, the 9-19x speedup has been reported, depending on the SIMD unit size (32/64/128-bit numbers have been sampled and measured up). Even the smallest of the two, the 9x perfomance gain factor, is far from being marginal.

On the SIMD unit size of things, the performance difference between AVX-512 (the average of 1120 Mb/sec has been measured) and NEON implementation (the 478 Mb/sec throughput on average) is 2.4x smaller for NEON/SVE1 largely due to the smaller width of the units of processing. Again, the 2.4x factor is not in the marginal territory.

> What's not marginal is the improvements in power efficiency that come with new process nodes.

And that is an optimisation step, albeit a very important one. However, it will not make a quick sort implementation run 2.4x faster alone.

You completely ignored the "whole system performance" part of my statement. What percentage of your CPU time is spent running SIMD-optimized implementations of Quicksort? Now apply Amdahl's law.
«Whole system performance» is a meaningless term as it is a function of many, usually poorly controlled, input variables, and your whole system is different from my whole system. If my VPN tunnel allows me to have faster transfer speeds simply by virtue of having ISA assisted optimisations in the cryptographic library it uses, the net result will be very noticeable to me but perhaps not for you unless you also have to use the same VPN client.

Even the web browser you are using right now to comment on HN likely makes use of the very same Highway library (Chrome and Firefox certainly do, unsure about Safari) the speedup gains have been reported for. The «overall» browser performance will also improve as the result due to it receiving gains transparently, by simply dropping an optimised implementation into the browser build.

Soo... basically a 2x speedup in going from 4x128b to 2x512b ALUs, after discounting the frequency difference. But realistically, Intel's client configurations are 3x256b, which is only 25-40% faster in that paper.

(I suspect any application doing enough quicksort that the 2x speedup is significant, would be even happier going slightly off-core to a coprocessor more specialized in vector processing, like Hwacha. There's plenty of space between "tightly-coupled CPU SIMD" and "GPU" that I think makes more sense than needing to implement 512-bit registers in little cores.)

> Soo... basically a 2x speedup in going from 4x128b to 2x512b ALUs, after discounting the frequency difference. But realistically, Intel's client configurations are 3x256b, which is only 25-40% faster in that paper.

2.4x difference was, in fact, reported, however I still find it somewhat difficult to interpret the reported results. The processing unit size difference alone and the number of LU's can't account for such a big difference in transfer speeds as the M1 Max that was used in the assessment has a very wide memory bus (256 bit wide for a performance core cluster or 512 bit wide for the entire SoC) as well as unusually large L1-D cache and a large L2 cache, with both caches having deep TLB's. The test set they used could also fully fit into the L2 cache. I have asked the Google engineer a question in a separate thread about what else could influence the observed performance difference but have not received a satisfactory explanation.

This is an interesting question. Has there been much uptake of such accelerators/coprocessors? One concern is that by the time the HW is ready, SW wants to do something different, perhaps fusing some other step with the sort. Another is deployability: everyone has SIMD/vectors on board, but even GPUs aren't quite everywhere nor so easy to scale out.

Also, there are now several RISC-V CPUs with 512-bit vectors, and it seems fair to call them little cores especially compared to x86 and M1/M2. Perhaps 512-bit is more feasible (and sensible) than is widely believed?

> adding SVE2 can be great for specific applications but it's really marginal when looking at whole system performance. What's not marginal are the improvements in power efficiency

Depends on the applications, I suppose. But did you know that (at least on OoO x86), the energy cost of scheduling an instruction dwarfs that of the actual computation? That is why SIMD, including SVE2, can be so important - it amortizes that cost over several elements. Let's spend (more of) our energy budget on actual work.

Is it really just "very few things that actually start using new SIMD"? I'm not a huge fan of autovectorization, but even that is able to vectorize some fraction of STL algorithms. And there are several widely used libraries, including image/video codecs and encryption, that use SIMD and wouldn't be feasible otherwise.

Android flagships are shipping with SVE2 as of this year, which I actually didn't realize until like two weeks ago because there's been nearly zero buzz about it. What's SVE2 being used for over NEON as of now?
Low level runtime optimisation that yields substantial performance gains in the user facing or system level software, ranging from cryptography through to data processing algorithms and very high throughput JSON parsing.

Take OpenSSL as an isolated example. By simply fiddling with the C compiler flags to allow it to use NEON on M1, the sha256 calculation speed-up is 4x for 128 and 256 block sizes, with performance gains quickly tapering off for larger block sizes and resutling in a modest 10% increase only. And that performance increase happens without the involvement of hash functions having been manually optimised for NEON/SVE1.

SVE2 with its variable vector size support could improve performance for larger unit sizes. Perhaps it is the time to spin up a Graviton3 instance and poke around with clang/gcc to see how actually good or faster the SVE2 is.

Yeah that's NEON. And there's instructions that literally calculate SHA256 so generalizing that is moot. My point was first, what real benchmarks are there of SVE2's benefits over NEON with mainstream CPUs that M2 would compete against? Unlike AVX-512, NEON was already pretty rich, so the new instructions have rather specialized usefulness.

Because outside of servers where little cores don't exist, 256b ALUs in big cores mean 256b registers in little cores, and Cortex-A510 is way smaller than Gracemont. And then you're giving Samsung another opportunity to screw up big.LITTLE...

And even the server CPUs with SVE are 2x256b except A64FX which is HPC exclusive, so no better than 4x128b.

SVE2 does not increase the maximum speed. That depends only on the width and number of the ALUs, on the number of cores and on the clock frequency.

The purpose of SVE2 is to simplify the writing of the software that exploits the data parallelism, both when that is done manually and when that is done automatically by an autovectorizing compiler.

With SVE2 it should become much easier to deal with data structures where the sizes and the alignments are not multiples of the ALU width and it will also no longer be necessary to write many alternative code paths, to take advantage of any future better CPUs, like when optimizing for Intel SSE/AVX/AVX2/AVX-512.

There are still a majority of programs that do not utilize as frequently as possible the existing SIMD units. With SVE2, their number should diminish.

Fab process is very important. The fast design is nothing if you can't build it.
Key talent is incredibly important. If you lose enough senior engineers, it doesn't matter how talented the rest are. You've lost so much institutional knowledge that is either extremely difficult or impossible to regain. And Apple is notoriously under-staffed for a lot of their projects. With the staffing losses to Nuvia, I wouldn't be surprised if they lost enough key talent that it's going to take them a long time to recover and be able to deliver significant performance improvements again. That's what happens when you treat software developers/engineers like commodities.
I thought AMD succeeded because they managed to get Jim Keller and other great engineers? Unsure why you're placing your hope on a CEO.
No one person deserves the credit for a processor and if Intel had delivered on their roadmap (Ice Lake in 2017 and Alder Lake in 2019) AMD would be dead today.
AMD survived bulldozer. They survived not because of the quality of their chips, but because there exists a sizable x86 market that is literally anything except Intel.
Why does that market exist?
I would expect a large part of it to come from personal dislike in dealing with Intel by people in the position to pick parts and make the sourcing decisions.
Because, before the mobile explosion, we had the PC industry explosion which created a huge demand for X86 PCs and Macs in every home.

And even after the mobile revolution shrank the demand for X86 PCs, the cloud revolution further entrched X86 in the cloud.

The author has been pushing this conjecture for a year over the past year or so, and has been repeatedly called out on the Hardware reddit.

I would recommend not taking their business conjecture without a giant pinch of salt. Just today they were claiming Apple has lost hundreds of engineers in the chip division. The idea that a single division somehow lost hundreds without the industry noticing is ridiculous.

I wondered about this too and I like your advice about salt but apparently Apple is suing Rivos about this very thing: https://www.reuters.com/legal/litigation/apple-lawsuit-says-...
The article says 40 employees not hundreds, but I imagine that 40 of Apple's top chip talent is going to hurt, that is a lot of brain power to lose!

Seems some employees took more than themselves to Rivos. "at least two former Apple engineers took gigabytes of confidential information with them to Rivos."

Remember Apple cancelling their contract with Imagination (GPU) and hiring their employees to work on Apple's GPU?
Hurts donut.
I wonder if tha’ts subject to criminal prosecution or merely civil remedies.
I never said or implied that they didn't take many, just not hundreds. In the tens? I believe that. Up to a hundred? That's a tall order. Multiple hundreds? That's catastrophic to any org, including one as large as Apple
I agree, reading this it just seems like this author has found a market for people who want read news about how AAPL is going to drop tomorrow
I remember reading ESR's blog a decade (or more) ago where every single technical advancement was going to lead to Apple's doom. Every new competitor that popped up was going to lead to Apple's doom. Every legislative initiative was going to lead to Apple's doom. After awhile, I stopped reading his blog because despite a lot of good insight in some areas, his cheerleading for Apple's Doom had clearly created too much bias in his judgement for me to take anything he said seriously.

Apple will eventually be overtaken by another company at some point, but there's a world of journalists and pundits who continue to cry wolf every day.

Why do you think the industry hasn't noticed? If it's not hundreds, how many Apple employees have moved to Nuvia and Rivos?
It has noticed. Look at Apple architects, validation, layout, etc engineers moving to Nuvia + Rivos + Google + Amazon + Microsoft + Meta + Intel + Nvidia + AMD + Apple + Qualcomm.

It's there.

In multiples of tens, I would believe it. Upto a hundred over a couple years? That's a stretch but possible if you count a very wide range of roles. Multiple hundreds as they imply on Reddit? That would be catastrophic to any company, even one as large as Apple. You would certainly see it reflected in their job postings after even a few, let alone hundred+
Looking at another article from the same author[0], we'll have a pretty solid answer to the impact in September with the release of the A16. Apparently the A15 had very minimal CPU gains clock-for-clock over the A14.

To quote from that article:

"SemiAnalysis believes that the next generation core was delayed out of 2021 into 2022 due to CPU engineer resource problems. In 2019, Nuvia was founded and later acquired by Qualcomm for $1.4B. Apple’s Chief CPU Architect, Gerard Williams, as well as over a 100 other Apple engineers left to join this firm. More recently, SemiAnalysis broke the news about Rivos Inc, a new high performance RISC V startup which includes many senior Apple engineers. The brain drain continues and impacts will be more apparent as time moves on. As Apple once drained resources out of Intel and others through the industry, the reverse seems to be happening now."

I was very optimistic on Apple on the CPU front until I read this today. Now I'm waiting to see how the A16 pans out for them to see if it's a two generation loss of progress, or just a single generation stumble.

0: https://semianalysis.substack.com/p/apple-cpu-gains-grind-to...

I don’t know Apple’s turnaround, but processors are released in products only long after their design is completed. Think at least several months, even likely a year+ between design and release.

Nuvia started early enough to be a factor here. But Rivos wasn’t even founded until June 2021. To release now, M2 would already have been at finished with design by then.

Chip manufacturing is difficult, this reminds me of the Japanese entry into the semiconductor market.

There is an excellent video on this for anyone interested in Japanese culture and the war against USA via semiconductors:

https://youtu.be/bwhU9goCiaI

Reads like FUD to me
Likely its a bit of hyperbole to get views.

I think there's always a desire to work at a startup in SV and in a low/zero interest rate environment - VCs could probably fund something in the chip design space.

But now that interest rates are going up, I think that will be a lot tougher and Apple will be a better position due to their direct access to free cashflow - to either compete or acquire them at a later date.

Its also an observation that w.r.t. chip design and consumer electronics, the pay is general lower than say Google, Facebook, Salesforce, Web 2.0 based startups (i.e. AirBnb, Uber, DoorDash), etc.

My presumption is that this is because as a chip designer or embedded software/hardware engineer, the capital costs to do anything interesting on your own as a startup (i.e. tape out a chip, mass production in Asia, etc.) are very very high and very fixed and very up-front. Even fabless semiconductors and factory-less product design companies that outsource manufacturing to Asia would need to go find outside capital for IC masks or HW prototypes. You also need a cadre of supply chain, biz dev, marketing, ad spend, channel sales distribution.

Compare that to AirBnb, Dropbox where you need a good idea, a handful of 10x SW engineers and an AWS account that can scale as you grown and a free tier for onboarding customers. Therefore, Google/FB etc. need to pay more to prevent these folks from going off and starting their disruptor (i.e. Insta, WhatsApp, SalesForce).

++, totally. I think a lot of people are linking CPU engineering with SW engineering because they both work with the same product at the end of the day, but the industries are radically different both from a business and culture standpoint. The "go fast and break things" mentality that pervades the SV software startup scene is, in my experience, no where to be found in hardware because it's both incredibly costly to make any mistakes and because most CPU divisions are lead by people with decades of experience (rather than the mishmash that is startups).

The author's argument here about talent leaving after having "gotten Apple off x64" is such an odd take. It's not as if Apple started designing these chips after the M1 launched—the pipeline for even small SoCs is often five or more years. The bit about Rivos is especially bizarre because that company was founded in 2021, well after this chip must have been taped out.

Yep - also was going to add but it was getting kinda long...

With respect to Rivos, reading the about page - it seems an interesting take on RISC-V.

My take is that this will be rolled back into either Apple or Google at a later date - mostly as a hedge against someone (like Nvidia) acquiring the ARM IP now that its in play - or to provide some realistic alternative that can be used as a counter bid in licensing discussions with ARM.

Two of the founders of Rivos were involved in PA Semi which was acquired by Apple and Agnilux which was acquired by Google ChromeBook team.

I suspect we're looking at Apple implementing tick/tock, whether because they're forced to or because they want to - they've already been doing something similar on the iPhones, and supply constraint may make them do it on the chips, too.

Few people are going to upgrade from the M1 to the M2 anyway, so it makes sense to keep powder dry for the M3.

Intel tick-tock was alternating microarchitecture change with process optimization every other year.

It looks like M2 is neither of those, and it's already 2 year.

Work culture can mean a couple of things though. Building and delivering the M1 was probably a great experience. Maybe like the hardware equivalent of greenfield development. The M1 is out, and now it's about continual refinements. The people who love going from 0->1 are not always the same people who enjoy going from 1->100.

And while Apple isn't the max payer in SV, I'm sure they pay fine compared to other big tech. The issue is, chips are big right now and no existing big tech can compete compensation wise with shares in a growth chip startup. With VC drying up, I expect this to change back in Apple (and other big techs) favor.

The youngest, strongest RTL engineer I know jumped from Apple to Rivos.
I'm still amazed at how jumping to a rival firm like this is possible in California, those of us in pretty much the rest of the country are locked behind non-competes.
That’s quite an exaggeration - many states disallow or severely limit non-competes, and in many of the states that allow them, they are often unenforced, or easy to get around. So yeah - some people in some parts of the country are locked behind non-competes (if they aren’t willing to move), but it’s hardly everyone.
That is a gross exaggeration of the situation.

California has a total ban on non-competes.

A very small handful of other states put restrictions on non-competes, but even those generally allow non-competition agreements if time limited, and the employee makes over ~$100k.

It’s widely accepted that prohibiting non-competes has been a significant factor in the tech industry success in California.

As one example, it is well known that Amazon aggressively enforces non-competes, even against line engineers.

Non-competes without a monetary attachment are hard to enforce from the employer side. Judges don't look kindly to preventing someone from making a living. Of course companies hope the threat of going to court makes people back down - like Amazon who is known bully in this area.

But yes, I wish all states would just ban them outright. Or at least make them require compensation. If an employee is important enough to require a non-compete, then they are important enough to pay during the non-compete time period.

Does CA also ban them as part of an acquisition? I've seen them as part of the sale so everyone doesn't quit the day after the acquisition and start a competitor.

There are no non-competes of any kind in California, at all, ever.
Horace Greely had the answer 150 years ago.
I am curious. Would it be possible to provide a more direct reference?
Thanks! Apparently, I knew the quote, but not the author of it.