Hacker News new | ask | show | jobs
by qeternity 919 days ago
Batching helps throughput and anyone running in production will be doing batching.

But it's not free, and still comes at a cost of per-stream latency.

Speculative decoding seems less effective in practice than in theory.