|
|
|
|
|
by anerli
392 days ago
|
|
Qwen team shows how parallel streams of inference-time thinking tokens could be far more efficient than a serial stream. Compared to scaling parameters alone, the same performance increase using their technique may be achieved with 22x less increase in memory and 6x less latency increase. |
|