Hacker News new | ask | show | jobs
by MacsHeadroom 797 days ago
80GB in 4bit.

But because it only activates one expert at a time, it can run on a fast CPU in reasonable time. So 96GB of DDR4 will do. 96GB of DDR5 is better.

1 comments

WizardLM-2 8x22b (which was a fine tune of the Mixtral 8x22b base model) at 4bit was only 80GB.