|
|
|
|
|
by martinald
137 days ago
|
|
I think it's just routing to faster hardware: H100 SXM: 3.35 TB/s HBM3 GB200: 8 TB/s HBM3e 2.4x faster memory - which is exactly what they are saying the speedup is. I suspect they are just routing to GB200 (or TPU etc equivalents). FWIW I did notice _sometimes_ recently Opus was very fast. I put it down to a bug in Claude Code's token counting, but perhaps it was actually just occasionally getting routed to GB200s. |
|
Regardless, they don't need to be using new hardware to get speedups like this. It's possible you just hit A/B testing and not newer hardware. I'd be surprised if they were using their latest hardware for inference tbh.
[0] https://nitter.net/dylan522p/status/2020302299827171430