| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by seldo 656 days ago
	For agentic use cases, where you might need several round-trips to the LLM to reflect on a query, improve a result, etc., getting fast inference means you can do more round-trips while still responding in reasonable time. So basically any LLM use-case is improved by having greater speed available IMO.

1 comments

freediver 656 days ago

The problem with this is tok/sec does not tell you what time to first token is. I've seen (with Groq) where this is large for large prompts, nullifying the advantage of faster tok/sec.

link