The 2-bit version can run on a 24GB Titan RTX.
In terms of perplexity scores on the wikitext2 dataset, the results are as follows: Mixtral: 26GB / 3.79 Llama2-70B: 26.37GB / 4.13
The 2-bit version can run on a 24GB Titan RTX.
In terms of perplexity scores on the wikitext2 dataset, the results are as follows: Mixtral: 26GB / 3.79 Llama2-70B: 26.37GB / 4.13