| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Randor 650 days ago

> I don't know what that means. It seems like some dubious math with percentages.

I would start by downloading a 1.58 model such as: https://huggingface.co/HF1BitLLM/Llama3-8B-1.58-100B-tokens

Run the non-quantized version of the model on your 3090/4090 gpu and observe the power draw. Then load the 1.58 model and observe the power usage. Sure, the numbers have a wide range because there are many gpu/npu to make the comparison.

1 comments

kayo_20211030 650 days ago

Good one!

link