| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vkaufmann 121 days ago
	GPT-OSS-120B runs like hell on my DGX Spark

1 comments

The MXFP4 variant I suppose? My setup (RTX Pro 6000) does around ~140 tok/s with llama.cpp, around 160 tok/s with vLLM.

yep MXFP4 really fast :D