Hacker News new | ask | show | jobs
by bearjaws 471 days ago
Available on ollama now as well.
3 comments

Is this the best way to run your own models these days?
It's the easiest to setup, but you can get 2x-6x faster with TGI and vLLM depending on the scenario.
vllm isn't even hard to setup!

I find it so funny that HN is sitting in the stoneage with LLM inference.

Meanwhile I'm here with sillytavern hooked to my own vllm server, getting crazy fast performance on my models and having a complete suite of tools for using LLMs.

Most folks on here have never heard of sillytavern, or oobabooga, or any of the other projects for LLM UI/UX (LM-studio). It's insanity that there hasn't been someone like ADOBE building a pro/prosumer UI for LLMs yet.

i could not find it, where did you?
Ollama's library butchers names, I believe its this: https://ollama.com/library/qwq

The actual name (via HF): https://huggingface.co/Qwen/QwQ-32B

It indeed seems to be https://ollama.com/library/qwq -- the details at https://ollama.com/library/qwq/blobs/c62ccde5630c confirm the name as "QwQ 32B"
ollama pull qwq
I have been using QwQ for a while, and a bit confused that they overwrote their model with same name. The 'ollama pull qwq' you mentioned seems to be pulling the newest one now, thanks.
I am running ‘ollama run qwq’ - same thing.

Sometimes I feel like forgetting about the best commercial models and just use the olen weights models. I am retired so I don’t need state of the art.