Hacker News new | ask | show | jobs
by 1ba9115454 497 days ago
I can't imagine this setup will get more than 1 token per second.

I would love to see Deepseek running on premise with a decent TPS.

2 comments

It says 4.25 TPS in the first para.
Honest mistake. Some people think HN is just a series of short tweets and haven’t realized they are links yet!
It's the modern way. Why read when you can just imagine facts straight out of your own brain.
I agree but also found your comment funny in the context of LLMs. People love getting facts straight out of their models.
4.25 is enough tps for a lot of use cases.
That's still pretty slow, considering there's that "thinking" phase.
True, but 4.25 is the number we all want to know.
You can get 1t/s on a raspberry pi.

https://youtu.be/o1sN1lB76EA?si=i8ecEBjLdV0zewFQ

this has nothing to do with the full 671B and the ollama models are distilled qwen2.5
I appreciate both of these comments, thank you both.