| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Aurornis 66 days ago

So everyone is aware, you can already run Qwen3.5-27B on Vulkan or Apple's hardware. Every major inference engine supports it right now.

This repo is a vibecoded demo implementation of some recent research papers combined with some optimizations that sacrifice quality for speed to get a big number that looks impressive. The 207 tok/s number they're claiming only appears in the headline. The results they show are half that or less, so I already don't trust anything they're saying they accomplished.

If you want to run Qwen3.5-27B you can do it with a project llama.cpp on CUDA, Vulkan, Apple, or even CPU.

2 comments

Grimblewald 66 days ago

This, even on android via termux you can run ollama with gpu accelaration on phone. This works, though milage will vary.

link

dirtikiti 64 days ago

Yes, you can run Qwen on Vulkan or CPU. But you aren't getting 207t/s.

I just find it funny they talk about being vendor locked, and the only thing they support is nvidia.

link