Hacker News new | ask | show | jobs
by Cacti 864 days ago
The real bitch is you also need to replicate both the software and convince some large projects (eg, pytorch) to use and support your implementation, and it’s just all rough, very complicated, very fine-grained stuff. The hurdles here are very high.

And if you fuck that part up in any one of a dozen places, no one will use it, because the adoption cost is too high, or your implementation was 20% slower and so everything costs 20% more to use and no one uses it.

This is why you see things like TPUs never really damage NVIDIA, but why basically everyone is focused on open standards and open software. Basically the entire tech industry is using this approach as a way to slowly peel away the layers of this software until enough has been removed that NVIDIA can no longer use it as a moat.

2 comments

While I doubt OpenAI will be a good fit for semiconductors, my understanding is PyTorch and TensorFlow have been really good at embracing new accelerators, largely due to XLA.

PyTorch, TF, and JAX work great on TPUs. Adoption is low bc they are not really available outside the Google cloud.

AWS uses tricks to accelerate PyTorch with Inferentia/Trainium. Haven’t used it, but I have tried the equivalent for Apple silicon and rage quit after wasting half a day.
I mean, it took almost a decade to get there.
Right, but that was for XLA no? I think (not an expert) that it compiles code from franeworks into a lower-level IR.

That's gotta be way easier, no?

If you are going to go vertical then do it properly.

OpenAI could just build their own framework for internal use that works well on their silicon (see Jax+tpu)

Their starting point? Triton plus some triton libs. Jax chipped away at TF like this, and no reason why Triton can’t do the same to PyTorch.