|
|
|
|
|
by embedding-shape
202 days ago
|
|
6 tok/sec might be acceptable for a dense model that doesn't do thinking, but for something like DeepSeek 3.2 that does do reasoning, 6 tok/sec isn't acceptable for anything else but async/batched stuff, sadly. Even for a response with just 100 tokens we're talking a minute for it to just write the response, for anything except the smallest of prompts you'll easily be hitting 1000 tokens (600 seconds!). Maybe my 6000 Pro spoiled me, but for actual usage, 6 or even 9 tok/sec is too slow for a reasoning/thinking model. To be honest, kind of expected on CPU though. I guess it's cool that it can run on Apple hardware, but it isn't exactly a pleasant experience at least today. |
|