Hacker News new | ask | show | jobs
by rfoo 490 days ago
But... they are? o3-mini is faster than DeepSeek-R1 and has comparable capability. And while I hate "AGI achieved internally" meme, o3 is significantly better than o1. Though I doubt how long until DeepSeek-R3 happens. They could skip R2 too citing Cloudflare R2 :P
3 comments

A big part of why R1 is much slowerr than o3-mini is that inference optimization is not yet performed on most solutions for serving R1 models (so R1 is rather comparable to o1 or o1 pro in terms of latency rather than o1-mini or o3-mini). The MoE is already relatively efficient if perfectly load balanced in an inference setting and should have latencies and throughputs that are equal to or faster than equivalent dense models with 37B parameters. In practice due to MLA inference should be much faster yet for long contexts compared to typical dense models. If DeepSeek or someone else tried to distill the model onto another MoE architecture with even less active parameters and properly implement speculative decoding on top, one could gain additional speedups in inference. I imagine we will see these things but it takes a bit of time till they are all public.
I know that, I'm in this game. I was comparing API throughput/ttft/ttbt of DeekSeek's own R1 API before it went viral in the West, and o3-mini.

I remain unconvinced that DeepSeek themselves didn't optimize their own V3 inference good enough and left another 2x~3x improvement on the table.

I am sure DeepSeek did optimize the inference cost of R1. They did not yet release an efficient MoE downscaling of it, ie an R1-mini.
I think you could reconsider DeepSeek-R1: it's actually really good.

In comparison, o3-mini gets very vague in its reasoning, and gives surprisingly unhelpful answers (getting too short).

Plus, let's not forget, R1 is available to use and modify under MIT license, which is great.

I actually forgot that o3-mini was available now. I was using o1 numbers.