Hacker News new | ask | show | jobs
by ekidd 416 days ago
Also, the Chinese work is legit. DeepSeek introduced a whole bag of new techniques like GRPO, and released quite a bit of good open source tooling.

And Alibaba's Qwen team seems to be quite genuinely talented at "small" models, 32B parameters and below. Once you get Qwen3 properly configured, it punches well above its "weight class." I'm still running real benchmarks, but subjectively, it feels like the 32B model performs somewhere between 4o-mini and 4o on "objectively measureable" tasks. It's a little "stodgy" and formal by default, though. We'll see what it looks like when people start fine-tuning it.

If the US dropped off the planet, it would maybe set LLM technology back a year.

1 comments

Deepseek really changed how people think about Chinese tech. Even after new LLMs launched, Deepseek R1 and V3 hold their own on benchmarks and are significantly cheaper.