Hacker News new | ask | show | jobs
by Aurornis 66 days ago
So everyone is aware, you can already run Qwen3.5-27B on Vulkan or Apple's hardware. Every major inference engine supports it right now.

This repo is a vibecoded demo implementation of some recent research papers combined with some optimizations that sacrifice quality for speed to get a big number that looks impressive. The 207 tok/s number they're claiming only appears in the headline. The results they show are half that or less, so I already don't trust anything they're saying they accomplished.

If you want to run Qwen3.5-27B you can do it with a project llama.cpp on CUDA, Vulkan, Apple, or even CPU.

2 comments

This, even on android via termux you can run ollama with gpu accelaration on phone. This works, though milage will vary.
Yes, you can run Qwen on Vulkan or CPU. But you aren't getting 207t/s.

I just find it funny they talk about being vendor locked, and the only thing they support is nvidia.