| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by PicardsFlute 34 days ago
	The TTFT benchmarks don’t look right to me. I don’t use vLLM, but at 16k pre-fill, the M5 Max is 3.6 times faster than the M4 Max. The 5090 is surely faster, but the numbers in the article are not reflecting what I have seen thus far. Perhaps vLLM hasn’t been updated to use the new tensor APIs for metal? My point is this: The M5 should have reflected this in the charts, but it doesn’t. The situation on pre-fill is not nearly as bad as in the M4 generation.