|
|
|
|
|
by xg15
3 hours ago
|
|
So GLM emits fewer tokens and does fewer tool calls, but still takes over twice as long to complete. Can someone explain to me where that time usage is coming from if not from the model operation itself? Are the individual tool calls more complex and take more time to complete? Or is the rate of tok/s lower because the model does more compute per token? |
|
In addition to that, some of the open weights models like GLM 5.2 or DeepSeek v4 Pro tend to be MUCH slower when generating tokens, which contributes to the perceived slowness. Although I wouldn't call models like GLM 5.2 slow by any means, e.g. it is currently one of the fastest models inside Notion today.