|
|
|
|
|
by ydj
11 days ago
|
|
80tp/s with 5080 3090 combo is wild. I’ve been working with a 4090 and two Tenstorrent p150 cards, and manage only about 30 tps utilizing all three for qwen3.6 27b q8. Guess I got more optimization to do. Would like to see the perf of their setup with and without mtp and ngram speculative decoding though, as well as parallel decode performance (once llamacpp mtp plays well with multiple slots). Being in California electricity alone puts this non-competitive with just paying a cloud though. |
|
Very interesting though, these Tenstorrent chips. Might get one to experiment with.