| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zhisbug 1213 days ago
	Lmsys hasn't released any official 4-bit version. It might be a better idea to wait for the official 4-bit version. But it is interesting to learn that the third-party 4bit version has performance degeneration.

1 comments

superkuh 1213 days ago

Lmsys hasn't released any official weights for anything. They've released "deltas" and other people have applied those deltas to the appropriate llama weights and done the quantization.

I reject your premise that the 8 to 4 bit quantization is the cause of the vicuna fine-tuned llamas very average performance though. This hasn't been the case for any of the other 8 to 4 bit quantizations. It would be a unique outlier. And so I don't think this is the "cause" here.

link

zhisbug 1213 days ago

And I think the problem of taking the roles of users in vicuna is caused by this bug: https://github.com/lm-sys/FastChat/commit/1bb234265d16bdfd50...

which has been fixed recently.

Lmsys are launching new training jobs after this patch, please stay tuned.

link

superkuh 1213 days ago

Nah, I don't use huggingface transformers to run inference with the vicuna model. I use llama.cpp. But I do appreciate the tip.

edit: Oh, I was completely wrong. That's in the training not the inference so it applies to all the weights.

link

zhisbug 1213 days ago

My point is that I am not aware of any official 4-bit quantization version (delta or weights) by lmsys so it might too early to draw your conclusion that vicuna finetuned llamas degenerates a lot of performance at 4 bit but others are fine.

link