| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by refulgentis 77 days ago
	Reversing the X and Y axis, adding in a few other random models, and dropping all the small Qwens makes this worse than useless as a Qwen 3.5 comparison, it’s actively misleading. If you’re using AI, please don’t rush to copy paste output :/ EDIT: Lordy, the small models are a shadow of Qwen's smalls. See https://huggingface.co/Qwen/Qwen3.5-4B versus https://www.reddit.com/r/LocalLLaMA/comments/1salgre/gemma_4...

2 comments

scrlk 77 days ago

I transposed the table so that it's readable on mobile devices.

I should have mentioned that the Qwen 3.5 benchmarks were from the Qwen3.5-122B-A10B model card (which includes GPT-5-mini and GPT-OSS-120B); apologies for not including the smaller Qwen 3.5 models.

link

refulgentis 77 days ago

It’s not readable on a phone either. Text wraps. unless you’re testing on foldable?

link

BloondAndDoom 77 days ago

Small qwen models are magical

link

refulgentis 77 days ago

It's so so good.

I have an app I've been working on for 2.5 years and felt kinda stupid making sure llama.cpp worked everywhere, including Android and iOS.

The 0.8B beats every <= 7B model I've used on tool use and can do RAG. Like you could ship it to someone who didn't know AI and it can do all the basics and leave UX intact.

link