| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jeeeb 103 days ago
	> It is also a bit weird that they are not incorporating speculative decoding Wouldn’t speculative decoding decrease overall throughput, but optimise (perceived) responsiveness?

1 comments

For compute bound region(high batch size) yes, but for low batch size it could improve the throughput.