Hacker News new | ask | show | jobs
by SwellJoe 6 days ago
I dunno about that. Gemma 4 is probably the best model for general self-hosted use for almost everyone that doesn't have a data center in their basement. They didn't have to release it at all, and they didn't have to release speculative decoding drafters, and they didn't have to release the QAT version of the models that makes the 4-bit quantization perform very close to the bigger versions, and can run in 32GB. I'd love a 122B version of it, and I didn't realize they'd ever announced one was coming (though I remember there being speculation about it). But, also, I'm happy they're doing so much with it. They've got all the sizes covered, it has great prose for an LLM, better prose than even most larger models, it's got great audio and vision, and broad language support. As self-hosted general purpose models go, it's the total package.

Qwen 3.6 is maybe better for code (though I'm beginning to think otherwise after some benchmarking I've been doing, where Gemma 4 has been overperforming expectations), but for just about anything else, Gemma 4 is the one.

If they're gimping it, why is nobody else making a better one that small?