| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pbgcp2026 48 days ago
	"Q4 has generally been the sweet spot" for self-hosting, yes. For any real meaningful work it's dumb AF. The only way to get reasonable intelligence from mid-size Gemma or Qwen is to run full precision BF16. Anything else is just an emulation of AI.

1 comments

EntityDeletr 48 days ago

I would disagree. I have 8 GB of VRAM and 32 GB of RAM. I can either run a 4B BF16 dense model fully on GPU at around 30 t/s or Qwen3.6 35B A3B Q5_K_M at 20 t/s with GPU offload. Which one would I choose?

link