Hacker News new | ask | show | jobs
by refulgentis 77 days ago
Reversing the X and Y axis, adding in a few other random models, and dropping all the small Qwens makes this worse than useless as a Qwen 3.5 comparison, it’s actively misleading. If you’re using AI, please don’t rush to copy paste output :/

EDIT: Lordy, the small models are a shadow of Qwen's smalls. See https://huggingface.co/Qwen/Qwen3.5-4B versus https://www.reddit.com/r/LocalLLaMA/comments/1salgre/gemma_4...

2 comments

I transposed the table so that it's readable on mobile devices.

I should have mentioned that the Qwen 3.5 benchmarks were from the Qwen3.5-122B-A10B model card (which includes GPT-5-mini and GPT-OSS-120B); apologies for not including the smaller Qwen 3.5 models.

It’s not readable on a phone either. Text wraps. unless you’re testing on foldable?
Small qwen models are magical
It's so so good.

I have an app I've been working on for 2.5 years and felt kinda stupid making sure llama.cpp worked everywhere, including Android and iOS.

The 0.8B beats every <= 7B model I've used on tool use and can do RAG. Like you could ship it to someone who didn't know AI and it can do all the basics and leave UX intact.