| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by throwaway888abc 598 days ago

"Furthermore, AMD OLMo models were also able to run inference on AMD Ryzen™ AI PCs that are equipped with Neural Processing Units (NPUs). Developers can easily run Generative AI models locally by utilizing the AMD Ryzen™ AI Software."

Hope these AI PCs will run also something better than 1B model.

What is it useful for ? Spellcheck ?

3 comments

lumost 598 days ago

The point is that AMD is doing the legwork to ensure that AI models can run on their chips. While they could settle for inference workloads (port llama to AMD). It is unlikely that many teams will widely adopt their silicon unless they can be used in the end-end ML stack. Many pure OSS efforts have tried and failed to make AMD work for this use case.

As a chip maker - they will also have some undersold, QA, or otherwise wasted parts available for these training efforts - so the capex is likely less severe for them compared to a random startup betting on AMD.

link

cyberax 598 days ago

It's amazing how NVidia became worth $3T simply because they have better drivers and CUDA.

AMD has great hardware, but they never could be assed to do anything about their software.

link

nmstoker 598 days ago

"utilizing the AMD Ryzen™ AI Software* sounds really unappealing! Like when companies don't realise you think their software to leverage hardware is bad and you'd prefer being able to use features via something generic

link

anon291 598 days ago

It's not really. Anyone who's ever done any low-level assembly coding on modern chips knows that it is already a herculean engineering effort. The idea that your customers, who are experts in machine learning models (like transformers, activation functions, etc) are going to feel comfortable with memory hierarchies, synchronization, floating point precision, etc is just crazy.

link

cyberax 598 days ago

Yes, that's what I mean. NVidia provided easy to use tooling (CUDA), and made sure it JustWorks everywhere.

AMD did approximately nothing with ROCm.

Investing $10-20m of developer time into making ROCm work reliably easily would have paid for itself 100x.

link

almostgotcaught 598 days ago

> Investing $10-20m of developer time into making ROCm work reliably easily would have paid for itself 100x.

I love when outsiders throw around random-ass takes like this. Just curious: how'd you come up with this number? Is it backed by literally any thought/data/roadmap?

Let's do some rough back of the envelope calculations: 20MM is 100 engineers working for 1 year. Or maybe it's 5 years of work for 20 engineers? Which one of those perspectives (if any!) sounds to you like a reasonable assessment of the gap between AMD and NVIDIA?

A quick reminder before you answer: whatever you think is actually involved in improving ROCm, unless you work on ROCm, you're almost certainly not considering an entire iceberg of complexity (runtime/driver/firmware).

Let's put it another way: forget AMD investing, I'll invest in you since you're so confident. I'll give you 20MM as a high-interest, non-dischargeable loan (say 8%) and all the runtime/driver/firmware source for AMDGPU. Up for it? All you have to do is improve ROCm such that it's competitive with CUDA and you can take home a huge slice of the TAM and you'll be rich. Easy right?

Cutting to the chase: you're off by at least two orders of magnitude on your goofy estimate; the real numbers are probably closer to 200MM invested every year for 10 years. And you still wouldn't be caught up because in those 10 years NVIDIA wasn't sitting on its laurels just waiting for you to catch up!

link

cyberax 597 days ago

> I love when outsiders throw around random-ass takes like this. Just curious: how'd you come up with this number? Is it backed by literally any thought/data/roadmap?

It's a multiple of what the TinyGrad ( https://tinygrad.org/#tinybox ) startup raised in capital. So $10-20m is absolutely reasonable, especially if you add an established HR with a hiring pipeline, established IT dept, offices, etc.

The multiplier is also easy to justify, given the stock price of NVidia and AMD.

> A quick reminder before you answer: whatever you think is actually involved in improving ROCm, unless you work on ROCm, you're almost certainly not considering an entire iceberg of complexity (runtime/driver/firmware).

Oh, I do. I've been following the OpenSource AMD driver development for the last 2 decades.

And I maintain that the total amount of investment that AMD needed to make to rival NVidia in the market cap, would have been around that number.

> Cutting to the chase: you're off by at least two orders of magnitude on your goofy estimate; the real numbers are probably closer to 200MM invested every year for 10 years.

For an entirely new company starting from scratch? Reasonable. But AMD is not a new company, and they already are doing most of the work needed.

link

xtreme 598 days ago

200 mm/year gets you roughly 1000 engineers at 200k salary. Is that not enough to make rocm experience equal to cuda?

link

razodactyl 598 days ago

I appreciate this comment keeping us in line.

link

anon291 598 days ago

Oh I guess I was responding to the "It's amazing part". AMD sells a car without a steering wheel. NVIDIA does, and it's not really amazing that people prefer that one (in my opinion at least)

link

teleforce 598 days ago

Never underestimate development eco-system. Ballmer was famously repeatedly shouting developers many times in one of the Microsoft Windows conferences and now he's one of the richest persons. Microsoft also got out of their ways by introducing WSL for running Linux alongside Windows when they realized the majority of OS running their Azure cloud are Linux.

link

princearthur 598 days ago

Some use cases require a small memory footprint, e.g. parallel inferences. I suppose there are also dark patterns like tracking, where you don't want the load to stand out.

link

Havoc 597 days ago

It’s less size of model and more mem throughout and npu tops that’s the limiting factor for this class of device

Which means you can do larger but it’ll become ever slower

link