| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by anerli 439 days ago
	Qwen team shows how parallel streams of inference-time thinking tokens could be far more efficient than a serial stream. Compared to scaling parameters alone, the same performance increase using their technique may be achieved with 22x less increase in memory and 6x less latency increase.