| HN Mirror

TPU and other accelerator performance varies by application... And even in a hugely popular config (like finetuning LLaMA with JAX) its hard to find a good benchmark. But generally speaking Google charges a pretty penny for TPUs

Accelerators outside TPUs are exotic. Off the top of my head... Cerebras only offers their WS2 as a 1st party "pay for a specific training job" kinda thing. Intel Gaudi 2 is supposedly good but is mysterious to me, and Ponte Vecchio has barely started shipping. Graphcore and Tenstorrent chips in the wild seem kinda long in the tooth for big training jobs. The AMD MI300 is not shipping, and the older AMD Instincts are difficut to find in cloud services (maybe because they got eaten up for HPC?)

Lots of other promising accelerators (with my personal favorite being the Centaur x86 "accelerated" CPUs, perfect for dirt cheap LLM inference/LORA training) died on the vine because of the CUDA moat, and I think more will share the same fate.