| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by re5i5tor 195 days ago
	For anyone using Qwen3-VL: where are you running it? I had tons of reliability problems with Qwen3-VL inference providers on OpenRouter — based on uptime graphs I wasn’t alone. But when it worked, Qwen3-VL was pack-leading good at AI Vision stuff.

4 comments

lreeves 195 days ago

I run the larger version of it on a Threadripper with 512GB RAM and a 32GB GPU for the non-expert layers and context, using llama.cpp. Performs great, however god forbid you try to get that much memory these days.

link

sosodev 195 days ago

I’ve noticed that the open weight models have a lot of issues on OpenRouter. You get a lot of inconsistency in quality due to varying quants at least. I’ve had some seriously nonsensical responses from models that I can’t replicate at all when I switch providers. Lots that just randomly fail to handle requests too. I would recommend finding a provider that works best for your needs and pinning it.

link

btian 194 days ago

My company's GPU cluster

link

m00dy 195 days ago

I run it on ollama

link

nicman23 195 days ago

the big boy model?

link

adastra22 195 days ago

It's not that big of a model?

link

mkl 195 days ago

235B-A22B is pretty big.

link