And yet Anthropic is paying xAI over a billion dollars a month for those out of date GPUs in their first datacentre (H100s being nearly 4 years old at this point).
Even A100s are still barely available on the major clouds despite being 6 years old.
Yeah most of the performance increases have mostly been from architectural improvements like reduced precision tensor cores. AFAIK FP4 is basically the limit for floating point matmuls, after which you need to switch to integer addition if you want to reduce bits, and I don’t think we’ve figured out 1-bit LLMs just yet.
Even A100s are still barely available on the major clouds despite being 6 years old.