This is insanely slow given its 200+GB/s memory bandwidth. As a comparison, I've tested GPT OSS 120B on Strix Halo and it obtains 420tps prefill and >40tps decode.
Probably the quants have higher perplexity, but the Sparks performance seems to be lack lustre. The reviewer videos I've seen so far tries their best not to offend Nvidia or, rather, not break their contracts.