Hacker News new | ask | show | jobs
by verdverm 9 hours ago
I have tried llama-cpp, vllm is nicer (ray, handles queueing, doesn't have the cache invalidation bug for qwen/gemma models) and unsloth has toxic employees in their discord.

I've run 2 qwen/gemma @8bit with full context window side-by-side. Right now I have 4 models on my spark (qwen36moe, embedding, reranker, qwen3-1.7B) to support my markdown kb tool.

The setup is not as capable, but still good and gets better with models/algos. To me, it's more about the freedom to tinker, freedom from token bill anxiety, and potential right to compute should the government/oligarchy decides it gets to decide who can access which models.

1 comments

> unsloth has toxic employees in their discord

Would you mind elaborating on this?

Sure,

I shared a project in their #research channel where I used their qwen36moe quant to refresh my PhD research. The channel had a topic that ended with something like "and all things research..."

One of their people accused me of self-promotion, and I reiterated that I shared it in that channel because it was their quant doing something (I thought) interesting as a research model. The number of people interested in the topic can be counted on your hands (in binary).

They remained accusatory, made it personal, and then started deleting messages. I suppose I escalated a bit (from their perspective), saying how this was not a good first encounter, they could have asked me to move it instead of just deleting it. Then they deleted every message, including all of their own, and put me in timeout. Erased from history, unable to participate, and so I left.

A coworker of mine (ML guy) is also sus about their quants, not nefarious, more that their benchmark results do not mean they are better, possibly skewed / benchmaxxed.