| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by limoce 244 days ago
	> ollama gpt-oss 120b mxfp4 1 94.67 11.66 This is insanely slow given its 200+GB/s memory bandwidth. As a comparison, I've tested GPT OSS 120B on Strix Halo and it obtains 420tps prefill and >40tps decode.

1 comments

nialse 243 days ago

Probably the quants have higher perplexity, but the Sparks performance seems to be lack lustre. The reviewer videos I've seen so far tries their best not to offend Nvidia or, rather, not break their contracts.

link