Y
Hacker News
new
|
ask
|
show
|
jobs
by
grepfru_it
373 days ago
I am curious the need for 70 t/sec?
2 comments
Aeolun
373 days ago
Waiting minutes for your call to succeed is too frustrating?
link
ekianjo
373 days ago
Depends entirely on the use case. Not every LLM workflow is a chatbot
link
jbellis
373 days ago
no, but if you're not latency sensitive you should probably be using DeepSeek v3 (cheaper than flash, significantly smarter)
link
lostmsu
373 days ago
What makes you believe DeepSeek is smarter than Flash 2.5? It is lower on all leaderboards.
link
jbellis
372 days ago
you're right, I should clarify that I'm talking about no thinking mode, otherwise flash goes from "a bit more expensive than dsv3" to "10x more expensive"
link
cootsnuck
373 days ago
High concurrency voice AI systems.
link
grepfru_it
370 days ago
Why are you self hosting that?
link