Y
Hacker News
new
|
ask
|
show
|
jobs
by
ycui7
56 days ago
You can get 120TPS (144 peak) with Qwen3.6-27B on RTX PRO 6000 with autoround when MTP enabled. It runs faster than sonnet api calls.
5090 gets maybe 100TPS with MTP