|
|
|
|
|
by heipei
1 hour ago
|
|
Depends on what you mean by "local". On your Macbook, large dense models like Qwen 3.6 27B will be slow, sure. On a local workstation with a dedicated RTX card you can get > 100 tps, which is more than good enough to work with it, and faster than cloud models in many cases. |
|
I don't care how many tokens per second of nonsense it can generate.