Hacker News new | ask | show | jobs
by kayo_20211030 605 days ago
I'm not an AI person, in any technical sense. The savings being claimed, and I assume verified, are on ARM and x86 chips. The piece doesn't mention swapping mult to add, and a 1-bit LLM is, well, a 1-bit LLM.

Also,

> Additionally, it reduces energy consumption by 55.4% to 70.0%

With humility, I don't know what that means. It seems like some dubious math with percentages.

2 comments

Not every instruction on a CPU or GPU uses the same amount of power. So if you could rewrite your algorithm to use more power efficient instructions (even if you technically use more of them), you can save overall power draw.

That said, time to market has been more important than any cares of efficiency for some time. Now and in the future, there is more of a focus on it as the expenses in equipment and power have really grown.

> I don't know what that means. It seems like some dubious math with percentages.

I would start by downloading a 1.58 model such as: https://huggingface.co/HF1BitLLM/Llama3-8B-1.58-100B-tokens

Run the non-quantized version of the model on your 3090/4090 gpu and observe the power draw. Then load the 1.58 model and observe the power usage. Sure, the numbers have a wide range because there are many gpu/npu to make the comparison.

Good one!