tbh anyone can build fast hw for a single model, I’d audit their plan for a SW stack before joining. That said their arch is pretty unique so if they’re able to get these speeds it is pretty compelling
Our hardware architecture was not designed with LLMs in mind, let alone a specific model. It's a general purpose numerical compute fabric. Our compiler allows us to quickly deploy new models of any architecture without the need that graphics processors have for handwritten kernels. We run language models, speech models, image generation models, scientific numerical programs including for drug discovery, ...