Hacker News new | ask | show | jobs
by hwertz 498 days ago
Well, just running on a 6C/12T Coffee Lake CPU, (I'm looking through these speeds in LM Studio as I type this..) I got like 2 tokens a second with Deepseek R1 14B, 3.4 with 7B Qwen, and 4.4 with 8B Llama, although out of those two I found 7B Qwen's answer to be a bit better. (My GTX1650 has 4GB VRAM, loading 1/4 the layers is pretty ineffective, GPU util went up to 10% and I gained like 1 token a second LOL.)

So it'd take a minute or two to type out one of those answers where it's got about 4 or 5 beefy paragraphs of thought and a decent sized paragraph for it's answer. I'll put it this way, I can type 120 WPM and it puts out text a bit faster than I could write it.

Input's a LOT faster though, I was asking these models to analyze a document so my input was like 2200 tokens, they all did well over 100 tokens a second on input.