Hacker News new | ask | show | jobs
by changoplatanero 866 days ago
Doesn't nvidia have huge margins? so if someone just makes a clone of the nvidia gpu then it can erode their margins and drive down the cost of compute
4 comments

AMD will succeed at this as long as they keep it together.
Everytime I'm tempted to think software is easy compared to hardware, I just remember that AMD is leaving about a trillion dollars worth of market cap on the table, because they haven't figured out a good alternative to CUDA.
They are definetly putting a lot of effort into ROCm & HIP, but definetly accelerating.

ROCm 6 was out Dec 16 (2023), 5.5 was May (2023). 5 was Feb 10 (2022). 4 was Dec 19 (2020)

Fred Brooks wrote in The Mythical Man-Month that it's harder (more time-consuming) to produce the software that corresponds to a given hardware. In 1975.
Hardware was much simpler and less complex then than now. I wonder how or if that's changed by going from hundreds or thousands of transistors to billions.
They’ll need to either reverse engineer CUDA or incentivize reimplementation of everything out there to use ROCm/OpenCL and forgo all the work load optimization done for Nvidia GPUs. I think that’s a non trivial moat.
This has been my perception of AMD for the past 20 years. First against Intel, then ARM, now NVIDIA. "If only ..."
The real bitch is you also need to replicate both the software and convince some large projects (eg, pytorch) to use and support your implementation, and it’s just all rough, very complicated, very fine-grained stuff. The hurdles here are very high.

And if you fuck that part up in any one of a dozen places, no one will use it, because the adoption cost is too high, or your implementation was 20% slower and so everything costs 20% more to use and no one uses it.

This is why you see things like TPUs never really damage NVIDIA, but why basically everyone is focused on open standards and open software. Basically the entire tech industry is using this approach as a way to slowly peel away the layers of this software until enough has been removed that NVIDIA can no longer use it as a moat.

While I doubt OpenAI will be a good fit for semiconductors, my understanding is PyTorch and TensorFlow have been really good at embracing new accelerators, largely due to XLA.

PyTorch, TF, and JAX work great on TPUs. Adoption is low bc they are not really available outside the Google cloud.

AWS uses tricks to accelerate PyTorch with Inferentia/Trainium. Haven’t used it, but I have tried the equivalent for Apple silicon and rage quit after wasting half a day.
I mean, it took almost a decade to get there.
Right, but that was for XLA no? I think (not an expert) that it compiles code from franeworks into a lower-level IR.

That's gotta be way easier, no?

If you are going to go vertical then do it properly.

OpenAI could just build their own framework for internal use that works well on their silicon (see Jax+tpu)

Their starting point? Triton plus some triton libs. Jax chipped away at TF like this, and no reason why Triton can’t do the same to PyTorch.

Competitors don't have access to the process node. You'll get competitors, but they won't be as fast or able to run the latest models. That means they'll compete with older versions of NVIDIA's chips.
Agreed. commoditizing the complement of OpenAIs models.