I find it so funny that HN is sitting in the stoneage with LLM inference.
Meanwhile I'm here with sillytavern hooked to my own vllm server, getting crazy fast performance on my models and having a complete suite of tools for using LLMs.
Most folks on here have never heard of sillytavern, or oobabooga, or any of the other projects for LLM UI/UX (LM-studio). It's insanity that there hasn't been someone like ADOBE building a pro/prosumer UI for LLMs yet.
I have been using QwQ for a while, and a bit confused that they overwrote their model with same name. The 'ollama pull qwq' you mentioned seems to be pulling the newest one now, thanks.