Hacker News new | ask | show | jobs
by amunicio 1116 days ago
> The trouble is that AMD just didn't take AI seriously.

Until a couple of years ago, AMD was in survival mode, fighting Intel on one side and Nvidia on the other. Two rivals that were making money hand over fist while AMD was bleeding money.

AMD picked open standards and made investments on open source frameworks and libraries commensurate with their financials, the hope being that the community could help pick up some of the slack. The community, understandably, went with the proprietary solution that worked well at the time and had resources behind.

The net results is that the Nvidia ecosystem has gained a dominant position in the industry and benefits from being perceived as a quasi-standard. On the other hand, open source efforts by AMD or others get viewed as "not serious".

The financial situation of AMD has improved somewhat over the last couple years. So AMD is "taking AI more seriously now". But it might be too late and the proprietary ecosystem has probably won.

6 comments

For what it's worth, AMD is also incredibly proprietary. The drivers being open source really helps with compatibility and your kernel, but you're still interacting with a massive computer running it's own OS with its own trusted code solution. And that computer also has DMA to your computer.

I would consider their open efforts to be "not serious" for anyone but the consumer space - games, desktop users, maybe even professional text editors. If you're using the GPUs for "professional" applications in a one-off scenario, even AMD falls short.

I'm honestly not sure what the moral of this story is.

The moral of the story is that Nvidia invested a lot more in low level software developers for their GPU solutions and AMD did not, and it shows.

"Open source" by itself is not a magic dust you can sprinkle on your projfcts that will make your software work well.

AMD's focus was always on pure compute power at a good price. And they always beat NVidia at that game. AMD cards always had the highest hash rate per dollar in crypto mining. AMD has 100% of the console market and the fastest iGPUs by 2x over Intel.

NVidia decided to use gimmicks to sell their cards including texture compression, lighting tricks, improved antique video encoders, motion smoothing, bad proprietary variable refresh rate, ray tracing, cuda and now machine learning features.

Nvidia is fortunate that machine learning has taken off. That is masking AMD winning market share from weak overpriced NVidia 3D products!

You're mashing together a lot under "gimmicks" there.

Texture compression: Useful for games, ongoing work, although I wish they would make cards with appropriate amounts of VRAM

Lighting tricks: Not sure what this is referencing

Improved antique video encoders: NVENC started out with only h.264, but now it supports h.265 and AV1, which aren't antique at all. Niche, but widely used in the streaming industry.

Motion smoothing: The hardware optical flow accelerators in newer cards are important for DLSS, which is a bit gimmicky but works mostly as advertised.

Bad proprietary vrr: No argument here, gsync sucked.

Ray tracing: All 3d games are going to be ray traced sooner or later. Getting a head start on it is a good move, and it's a big head start. The 4090 is ~100% faster than the 7900xtx.

CUDA: No one can seriously call CUDA a gimmick.

Machine learning features: Tensor cores are great.

CUDA is a gimmick though.

CUDA isn't a "technology", its a shader language that has been supplanted by better industry-wide standards.... the same standards whose shader languages are compiled by the same Nvidia shader compiler.

CUDA is a moat whose muddy waters has long since ran dry, and you're drinking koolaid if you think its still relevant for greenfield projects.

> and you're drinking koolaid if you think its still relevant for greenfield projects.

So I want to start a new GPU compute project today. Obviously this will primarily be deployed to AWS/Azure/etc, which means only high-end GPUs available are Nvidia. What do you recommend developing this application with?

The way I see it, you would have to be drinking koolaid to use anything besides CUDA.

There exists no AMD alternative to CUDA. How is this a “gimmick”?
> CUDA isn't a "technology", its a shader language that has been supplanted by better industry-wide standards

As someone who uses industry-wide standards in a related field...

The proprietary implementation often has the benefit of several more years of iteration with real products than the open standards. 'Supplanted' can only really be evaluated in terms of popularity, not newness or features, because features on paper aren't features in practice until they pay for their migration cost.

>a shader language that has been supplanted by better industry-wide standards....

Are you talking about Vulkan? If so, I'm not sure 'supplanted' is the right word.

That's a wild perspective. I don't know how you can really come to that conclusion either. One attempt at getting Blender to render something using an AMD vs Nvidia card will paint a very very clear picture.
Calling features which are integral to all modern games and most of which also got adopted by other vendors 'gimmicks' is kind of ridiculous.
You're entitled to your opinion (which I agree with in broad strokes) but with respect, the op article is specifically about ML. Calling cuda a "gimmick" is silly and completely underestimating the datacenter/ML cluster market share (it dwarfs consumer GPU), and fact of the matter is AMD's CUDA equivalent segfaults. So if "being actually usable to the biggest market" is a gimmick, so be it.
> AMD has 100% of the console market

thats not true, since the switch is based on an nvidia platform. since it's still 1/3 of the market, it's not as bad as it used to be.

and yet amd lately has been quietly just been slightly less than nvidia but worse product. amd sucks thats just it. their market share is crumbling and nvidias is getting stronger because people are like fuck it, at that price might as well jsut buy the better one that Just works TM
I personally don't have any insider information but just wanted to add what your saying fits with the meta on the gaming community side where commentators are frustrated that nVidia has so much hubris that they think they can just sell essentially last generation level technology without the step up (I think it was 3xxx vs 4xxx or something like that where you'd expect the 4060Ti to be at least as good as 3070Ti) and just trying to make up for it in "software".

It probably takes a lot of confidence in your software developers to make this kind of decisions.

A company that goes open source might get the icing for free, but they still have to bake the cake themselves.
Are they "incredibly proprietary" compared to the competition? Clearly they aren't. Nvidia offers blobs in both consumer and professional markets. Even going to the extent of gimping performance hardware through drivers on more than one occasion.

That said, I think AMD isn't really competing with Nvidia. Sure, their R&D budget is smallish but it feels like they're somewhat fine with the current status quo.

> Nvidia offers blobs in both consumer and professional markets

So does AMD.

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/lin...

And while they have an open version of the userland, it's also missing features compared to the proprietary one, etc.

Besides, in the end it truly hardly matters whether the firmware is loaded at runtime or lives in updateable flash. It's still not "your PC" in the Stallman sense either way, it's been tivoized regardless of whether firmware is injected at runtime or during assembly. You cannot load unsigned firmware on AMD anymore either, firmware signing started with Vega (iirc) and checksums now cover almost all of the card configuration similar to NVIDIA.

Firmware is also the only way to get proper HDMI support... which is why AMD still does not support HDMI 2.1 on linux. HDMI Forum will not license the spec openly and implementations must contain blobs or omit those features.

https://gitlab.freedesktop.org/drm/amd/-/issues/1417

Hey, I am not white knighting for AMD here. For all we know, they could only have been pursuing open standards because they've been forced to, as the underdog.

Can we really assign blame to them specifically for not fighting the hdmi forum on our behalf?

Isn't this sort of how specialized hardware kind of works?

At some point, hardware (necessarily?) evolves to become optimized to do one thing, and then you have to just treat the driver as an API to the hardware.

Even "simple" things like keyboards and mice are now small computers that run their own code, moreso more complex devices like sound cards and hard drives.

And since graphics card performance seems to be the bottleneck in a lot of computing, it has become super specialized and you just hand off a high-level chunk of data and it does magic in parallel with fast memory and spits it out the hdmi cable.

For the keyboard/mouse now being small computers that's been true since the 1970s. Almost all keyboards for a period of about 30 years had an 8048 or 8051 CPU. It's how they serialized the keystrokes. From the model M keyboard through to everything up till the USB era.
In the 70s it would be more common to have an MSI part that ran matrix scanning and spit out parallel bus ASCII. UCs were still spendy.
>own OS with its own trusted code solution

AMD is working on moving to things like the open source form of AGESA. They plan to start deploying openSIL by 2026.

> I'm honestly not sure what the moral of this story is.

That people will go with what is easier and works?

That open source and open standards don't win by default? That it takes a lot of persistence and effort.

What OS do you mean? The closest thing I can think of is the embedded CPU that gets called CP in the ISA docs, which mostly schedules work onto the compute units. That has firmware which is probably annoying to disassemble, but it's hard to imagine it doing anything particularly interesting.
The moral is that PSP FUD has nothing to do with AMD's lack of success in AI.
Nah. AMD was already profitable in 2018. This is just big mismanagement.

Just having 30 extra good software engineers focusing on AI would have made such a massive difference, because it's so bad that there's a lot of low hanging fruit.

As someone who was pretty invested in AMD stock since 2018, it always made me pretty angry how bad they managed the AI side. Had they done it well, just from the current AI hype the stock would probably be worth 50 bucks more.

> Nah. AMD was already profitable in 2018. This is just big mismanagement.

Hindsight bias much?

How easily we forget in today's speculative AI bubble that AMD rolled into 2018[1] with 6.1x levered D/E and substantial business uncertainty while the Fed was actively ratcheting interest rates up, and ended the fiscal year still 3.3x levered despite turning operationally profitable[2].

> Had they done it well, just from the current AI hype the stock would probably be worth 50 bucks more.

It strikes me as pretty audacious and quite unconscionable to assert "big mismanagement" while simultaneously crying about speculative short-term profit taking opportunities.

[1] https://www.sec.gov/Archives/edgar/data/2488/000000248818000...

[2] https://www.sec.gov/Archives/edgar/data/2488/000000248819000...

Hey the stock only 10x since 2018, we could do better, couldn’t we?
Not really hindsight bias.

As someone who had like 25% of their portfolio in AMD, it was pretty infuriating being forced to buy Nvidia GPUs every single time because the AMD ones were literally useless to me (lack of AI support and cuda in general).

Yes, there's AI hype right now. But Nvidia gpu datacenter growth isn't new. And AMd were asleep

Not asleep; they just directed their efforts at things that haven't worked out. With their APU lines it looked like they wanted to integrate GPUs completely into the CPU - that was hardly asleep to the importance of GPU compute.

The problem they ran in to looks to me to be that they focused on targeting a cost-effective low end market and were caught off-guard by how machine learning workloads work in practice - huge burst of compute to train, then much lower requirements to do inference. That isn't something they were strategically prepared for and that isn't something that software industry has seen before either.

Won't save them from market forces, but their choices to date have been reasonable.

Look long and hard at AMD's financials circa 2015[1]...for the sake of anticipated TL;DR, here are a few summary highlights:

  - -27.5% YoY revenue decline
  - -6.3% YoY gross margin decline
  - -$481M operating loss
  - $230M short-term debt
  - $388M non-cancelable operating lease commitments
  - $538M unconditional purchase commitments
  - $2.032B long-term debt (!)
  - -$412M stockholders' deficit (!!)
Seriously, look long and hard at those numbers, and when you think you understand what they might mean, consider them again and again until the feeling of insurmountable adversity sinks in and you're on your knees begging public equity markets for an ounce of capital and a pinch of courtesy faith...on the promise of meaningful risk-adjusted ROIC to be delivered in just a few years.

> But Nvidia gpu datacenter growth isn't new. And AMd were asleep

...which is why this remark comes off as sheer arrogance (no disrespect).

Su and the rest of AMD leadership certainly weren't asleep. The difference here is while you're busy scouting speculative waters defended by competition with deep battle pockets and an even deeper technical moat, Su was simply preoccupied bringing a zombie company back to life and building up enough health to slay a weaker giant.

Personally, I was already beyond impressed with one miracle delivered.

[1] https://www.sec.gov/Archives/edgar/data/2488/000000248816000...

>As someone who had like 25% of their portfolio in AMD

>Nah. AMD was already profitable in 2018. This is just big mismanagement.

I guess you know they have debt, and they were paying them off, and were battling with other issues all the way till 2019 / 2020 when Intel had their misstep so they could gain something in the CPU server market?

Yes. And they still could have afforded 30 software engineers to work on ai/compute painpoints.

But let's asume they thought it was too expensive back then. There's still no reason not to invest in software in 2020 when their gross margin was absurd.

Yup. Have 15 of those software engineers contribute pull requests to PyTorch to make its OpenCL support on par with CUDA and take the other 15 engineers to do the same for TensorFlow and AMD would already be a serious contender in the AI space.
Does AMD have tensor cores?
I'm not so sure anymore. The big reason is that now that the ML framework ecosystem has fragmented into different "layers" of the stack, very few people are directly writing CUDA kernels anymore.

As a result, with things like XLA now supporting AMD GPUs using RoCM under the hood the feature gap has closed A LOT.

Sure, Nvidia still has the performance crown lead with CuDNN, NCCL, and other libraries providing major boosts. But AMD is starting to catch up quite fast.

> it might be too late and the proprietary ecosystem has probably won.

Compiler ecosystems can and have changed rather quickly. Especially given that most NNs run on a handful of frameworks. Not _that_ many people are writing directly on top of CUDA/cuDNN.

Make an equivalent toolchain that runs on cheaper hardware and the migration would be swift.

Currently AMD hardware is a bit behind and the toolchain is frustratingly buggy, but it's probably not as big of a moat as NVIDIA are trading on. Especially since NV's toolchain isn't particularly polished either.

>AMD picked open standards and made investments on open source frameworks and libraries commensurate with their financials, the hope being that the community could help pick up some of the slack.

This has been their claim, but more often than not they haven't actually done anything to encourage the community to pick up slack. So many of their graphics tools have been released with promises of some sort of support or of working with the community yet have basically had nothing to help the community help them.

Even accepting the unreasonable idea that they can't afford the full-time developers for the various tools and libraries they come up with, they often don't even really work with the community to build and maintain those.

One of the bigger cases which contributed to turning me off from AMD GPUs was buying a 5700XT at launch, eager to work on stuff using AMD specific features, only to be led on for over a year about how ROCm support was coming soon, every few months they'd push back the date further until they eventually just stopped responding at all. Trying to develop on their OpenGL drivers was a similar nightmare as soon as you wandered off the old well worn paths to more modern pipeline designs.

Another glaring example would be Blender's OpenCL version of Cycles, which was always marred with problems and hacks to work around driver issues. They tried to work with AMD for years before finally just dropping it and going for CUDA (and thus HIP) even though AMD's HIP support, especially on Windows, is still in a very early state.

They've been getting piles of money from Ryzen for 5-6 years now. How long am I supposed to wait?

According to the latest ROCm release notes, it supports Navi 21. Well, at least the pro models. It doesn't even mention the 5000 or 7000 cards. My current understanding is that 7000 support is mostly there a few months late and 5000 was abandoned partway done after years of vague promises.

At least it might support windows soon. Not my sub-4-year-old GPU, of course, god forbid. But most of the rest of them.

AMD wasn't very profitable until 2018. The company's debt to equity ratio was terrible (due to previous CEO mistakes 2000-2012) until they paid off their huge debts with Ryzen 3 in ~2020. Be patient, grasshopper ..

https://www.google.com/search?q=amd%20debt%20to%20equity%20r...

> They've been getting piles of money from Ryzen for 5-6 years now

Hardware is very capital intensive. They've not been making much until much more recent. From 2012 through 2017, almost all years were a net loss. They hit $1B net profit only in 2020. I imagine quite a bit of that money went into keeping/accelerating the pace of Ryzen, and paying off debts. Only now do they have more breathing room for other endeavors. If they diverted a chunk of that change to AI, they probably would have a lower performing Ryzen right now.

So no, they did not have piles of money.