Hacker News new | ask | show | jobs
by limoce 244 days ago
> ollama gpt-oss 120b mxfp4 1 94.67 11.66

This is insanely slow given its 200+GB/s memory bandwidth. As a comparison, I've tested GPT OSS 120B on Strix Halo and it obtains 420tps prefill and >40tps decode.

1 comments

Probably the quants have higher perplexity, but the Sparks performance seems to be lack lustre. The reviewer videos I've seen so far tries their best not to offend Nvidia or, rather, not break their contracts.