| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by grepfru_it 373 days ago
	I am curious the need for 70 t/sec?

2 comments

Aeolun 373 days ago

Waiting minutes for your call to succeed is too frustrating?

link

ekianjo 373 days ago

Depends entirely on the use case. Not every LLM workflow is a chatbot

link

jbellis 373 days ago

no, but if you're not latency sensitive you should probably be using DeepSeek v3 (cheaper than flash, significantly smarter)

link

lostmsu 373 days ago

What makes you believe DeepSeek is smarter than Flash 2.5? It is lower on all leaderboards.

link

jbellis 372 days ago

you're right, I should clarify that I'm talking about no thinking mode, otherwise flash goes from "a bit more expensive than dsv3" to "10x more expensive"

link

cootsnuck 373 days ago

High concurrency voice AI systems.

link

grepfru_it 370 days ago

Why are you self hosting that?

link