Hacker News new | ask | show | jobs
by cbold 498 days ago
This is also interesting: "Customers will be able to use distilled flavors of the DeepSeek R1 model to run locally on their Copilot+ PCs."

This is news, because Microsoft seems happy to not be tied to OpenAI so heavily.

This could also safe a huge amount of money for their Office 365 Copilot initiative.

I figure Microsoft started analyzing these models ASAP in their labs to catch up with OpenAI, Google, Anthropic etc.

By also hosting this model, they will help normalize the use of them from which they immensely benefit.

1 comments

Well, just running on a 6C/12T Coffee Lake CPU, (I'm looking through these speeds in LM Studio as I type this..) I got like 2 tokens a second with Deepseek R1 14B, 3.4 with 7B Qwen, and 4.4 with 8B Llama, although out of those two I found 7B Qwen's answer to be a bit better. (My GTX1650 has 4GB VRAM, loading 1/4 the layers is pretty ineffective, GPU util went up to 10% and I gained like 1 token a second LOL.)

So it'd take a minute or two to type out one of those answers where it's got about 4 or 5 beefy paragraphs of thought and a decent sized paragraph for it's answer. I'll put it this way, I can type 120 WPM and it puts out text a bit faster than I could write it.

Input's a LOT faster though, I was asking these models to analyze a document so my input was like 2200 tokens, they all did well over 100 tokens a second on input.