Hacker News new | ask | show | jobs
by pama 490 days ago
I am sure DeepSeek did optimize the inference cost of R1. They did not yet release an efficient MoE downscaling of it, ie an R1-mini.