Hacker News new | ask | show | jobs
by kurtbuilds 810 days ago
What’s the process to deliver and test a quantized version of this model?

This model is 264GB, so can only be deployed in server settings.

Quantized mixtral at 24G is just small enough where it can be running on premium consumer hardware (ie 64GB RAM)