Hacker News new | ask | show | jobs
by re5i5tor 195 days ago
For anyone using Qwen3-VL: where are you running it? I had tons of reliability problems with Qwen3-VL inference providers on OpenRouter — based on uptime graphs I wasn’t alone. But when it worked, Qwen3-VL was pack-leading good at AI Vision stuff.
4 comments

I run the larger version of it on a Threadripper with 512GB RAM and a 32GB GPU for the non-expert layers and context, using llama.cpp. Performs great, however god forbid you try to get that much memory these days.
I’ve noticed that the open weight models have a lot of issues on OpenRouter. You get a lot of inconsistency in quality due to varying quants at least. I’ve had some seriously nonsensical responses from models that I can’t replicate at all when I switch providers. Lots that just randomly fail to handle requests too. I would recommend finding a provider that works best for your needs and pinning it.
My company's GPU cluster
I run it on ollama
the big boy model?
It's not that big of a model?
235B-A22B is pretty big.