Hacker News new | ask | show | jobs
by PicardsFlute 34 days ago
The TTFT benchmarks don’t look right to me. I don’t use vLLM, but at 16k pre-fill, the M5 Max is 3.6 times faster than the M4 Max. The 5090 is surely faster, but the numbers in the article are not reflecting what I have seen thus far. Perhaps vLLM hasn’t been updated to use the new tensor APIs for metal?

My point is this: The M5 should have reflected this in the charts, but it doesn’t. The situation on pre-fill is not nearly as bad as in the M4 generation.