| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by samus 229 days ago
	We very much can, especially such a Mixture of Experts model with only 3B activated parameters. With an RTX 3070 (7GB GRAB VRAM), 32 GB RAM and an SSD I can run such models at speeds tolerable for casual use.

1 comments

embedding-shape 229 days ago

How many tok/s are you getting (with any runtime) with either the Kimi-Linear-Instruct or Kimi-Linear-Base on your RTX 3070?

link

samus 229 days ago

With a Qwen3-32B-A3B (Q8) I'm getting 10-20 t/sec on KoboldAI, e.g., llama cpp. Faster than I can read, so good enough for hobby use. I expect this model to be significantly faster, but llama.cpp-based software probably doesn't support it yet.

link