| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by georgehotz 214 days ago

Full disclosure, we have a contract with AMD to get Llama 405B training on MI350X on MLPerf.

Things are turning around for AMD. If you have an AMD card, go to pytorch.org, click Linux+ROCm and install PyTorch. 3 years ago, this was hopeless. Today, most mainline things work. I ran nanochat on MI300X and it just worked. I think that's true about MI350X now too. The MI350X machine is stable.

They are clearly behind NVIDIA, nobody doubts that. And a lot of investment into software will be required to catch up, ecosystem, compiler, and driver. But 2 years ago they seemed hopeless, now they don't. Things take time. HipKittens is a great codebase to study to see where AMD's LLVM backend is still lacking; compare it to the CUDA Kittens.

For training, it's NVIDIA and Google in first. AMD in second. And nobody in third. Intel and Tenstorrent are not remotely close. Huawei examples segfaulted. Groq gave up selling chips. Cerebras isn't available anywhere. Trainium had a 5 day wait time to get one instance and I lost interest.

4 comments

latchkey 214 days ago

As CEO of an AMD NeoCloud for the past 2 years, it is so nice to hear all this and also see the turn around. It is what I bet my business on from the start and I can concur with what George is saying 100%.

The out of box experience can be a bit rough around the edges on bleeding edge stuff, but it isn't anything near as bad as it used to be. For example, a month ago nanochat wasn't working well and now it is. The important thing is that people now care enough to make it work.

At the end of the day, AI does need viable options. Having a monopoly on all AI hardware and software might be a good thing for share holders, but isn't a good thing for what is looking like a fundamental technology, akin to the internet.

link

ivape 214 days ago

That’s interesting, I was specifically looking for AMD hardware being offered by neoclouds, they seem to be rare.

I like your bet though. The difference between NVDA and AMD has never really existed on a hardware level for decades. AMD has always been on par, and software is software, it will catch up.

AMD will be a stock many people will miss because the opportunity has presented itself at the height of AI bubble talk, and this will leave many in the dust. Doubling and tripling of their market cap is pretty much a forgone conclusion.

link

latchkey 214 days ago

You're right, it is a much smaller ecosystem, but I think that is partly intentional as a way to focus efforts and not feed into the bubble, which I feel is a smart move. These are the official partners [0]. I'm Hot Aisle.

George was very smart, $500k in the $90's. I saw it coming even earlier than him, but that's cause I was already aware the hardware was good from my own experiences.

[0] https://www.amd.com/en/products/accelerators/instinct/eval-r...

link

LogicFailsMe 213 days ago

Will it catch up or will it forever chase nvidia's tail? I'm betting on the latter unless another AI winter happens. And contrary to anti-generative AI social media talking points, the literature suggests The Red Queen's race is continuing apace IMO.

Nvidia remains undefeated at responding to hardware threats with hardware diving catches to this day. What scenario prevents them from yet another one of their diving catches? I'm genuinely curious as to how one could pull that off. It's like challenging Google in search: even if you deliver better product and some have, the next thing you know Google is doing the same thing or better with deeper pockets.

link

ivape 213 days ago

Nvidia remains undefeated at responding to hardware threats with hardware diving catches to this day. What scenario prevents them from yet another one of their diving catches?

The fact that they make roughly the same hardware as AMD for the last 2 decades, and even today. There was no diving catch, AMD just ignored what the hardware was capable of and didn't reinforce OpenCL. There was literally no diving catch. For example, just in this thread alone, AMD paid someone to make this shit work on their hardware. Don't bet against what's coming.

link

LogicFailsMe 211 days ago

Except no, AMD 100% played follow the leader with technology like CUDA, NVLink, and tensor cores.

Even paying paying someone in academia to get s** to work on their hardware is yet another example of follow the leader.

What exactly do you think is coming? I think the biggest threat is one or more Chinese companies catching up on both hardware and ecosystem in the next half decade or so myself, mostly because of the state level support for making that so. But I absolutely don't expect an x86_64 moment for GPUs here given past results and the current bias against software in AMD's HW culture. Convince me otherwise.

link

WithinReason 214 days ago

How far is Tinygrad from being able to represent/search the kind of optimisations listed in the article? i.e.:

  1. data layouts to avoid local memory bank conflicts
  2. read patterns from global memory to optimize L2 cache reuse
  3. warp specialisation

How complex is it to add these into tinygrad?

link

georgehotz 214 days ago

1 and 2 are supported, 1 you need to specify, 2 will be found with BEAM. We are working on reimplementing HipKittens in tinygrad, all the stuff is there to do it. See the amd_uop_matmul example.

tinygrad doesn't support 3 yet, it's not needed on any AMD GPUs, and not needed on NVIDIA consumer. It wouldn't be hard to add, but it's important to figure out how it best fits with the existing abstractions. I think everything will eventually move to a more producer-consumer model.

link

0-_-0 214 days ago

Good luck with the AMD contract! I imagine HipKittens came at just the right time.

link

fulafel 214 days ago

Does consumer hardware (non-MI) need proprietary kernel drivers for running rocm + pytorch?

link

kieranl 214 days ago

No. But you might need a specific version of rocm built for your gpu. These are built on https://github.com/ROCm/TheRock

Right now AI support on AMD is officially only on specific models. But they are working hard to turn this around to have broader support. And making progress.

link

fulafel 214 days ago

Vulkan compute is also getting some good press as a local llm platform (at least on the linux side), will be interesting to see which crosses the line to "can ship production quality apps on this" first.

link

georgehotz 214 days ago

Nope! Works fine with in-tree somewhat recent kernel. The AMD driver is actually open source, not just a wrapper into a big on device blob like the NVIDIA one. tinygrad also has a driver that doesn't even need the kernel module, just mmapping the PCIe BAR into Python.

link

buckle8017 214 days ago

> Cerebras isn't available anywhere.

That sounds like they're winning.

link