Hacker News new | ask | show | jobs
by zhwu 1174 days ago
If you follow this command in their instruction, the delta will be automatically downloaded and applied to the base model. https://github.com/lm-sys/FastChat#vicuna-13b: `python3 -m fastchat.model.apply_delta --base /path/to/llama-13b --target /output/path/to/vicuna-13b --delta lmsys/vicuna-13b-delta-v0`
2 comments

This can be then quantized to the llama.cpp/gpt4all format, right? Specifically, this only tweaks the existing weights slightly, without changing the structure?
I may have missed the detail, but it also expects the pytorch conversion rather than original LLaMa model.
Yes, you need to convert the original LLaMA model to the huggingface format, according to https://github.com/lm-sys/FastChat#vicuna-weights and https://huggingface.co/docs/transformers/main/model_doc/llam...