| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by MMMercy2 1171 days ago
	You can use this command to apply the delta weights. (https://github.com/lm-sys/FastChat#vicuna-13b) The delta weights are hosted on huggingface and will be automatically downloaded.

2 comments

superkuh 1171 days ago

Thanks! https://huggingface.co/lmsys/vicuna-13b-delta-v0

Edit, later: I found some instructive pages on how to use the vicuna weights with llama.cpp (https://lmsysvicuna.miraheze.org/wiki/How_to_use_Vicuna#Use_...) and pre-made ggml format compatible 4-bit quantized vicuna weights, https://huggingface.co/eachadea/ggml-vicuna-13b-4bit/tree/ma... (8GB ready to go, no 60+GB RAM steps needed)

link

eurekin 1171 days ago

I did try, but got:

``` ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported. ```

link

superkuh 1171 days ago

> Unfortunately there's a mismatch between the model generated by the delta patcher and the tokenizer (32001 vs 32000 tokens). There's a tool to fix this at llama-tools (https://github.com/Ronsor/llama-tools). Add 1 token like (C controltoken), and then run the conversion script.

link

DrSiemer 1171 days ago

Just rename it in the tokenconfig.json

link

eurekin 1170 days ago

Thanks, that indeed worked!

This and using conda in wsl2, instead on bare windows

link