| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by MMMercy2 1179 days ago
	I am a Vicuna developer. We plan to release the weights once we have addressed all concerns and have a low-resource version of the inference code ready. We released the demo first to get some early feedback on the model.

5 comments

techdragon 1179 days ago

We hear a lot about "concerns" and many of us don't share the same ones... It would be good for clarity to know what are the concerns you feel are important enough to hold back releasing the weights?

link

zhwu 1179 days ago

It is mainly because of the legal issues caused by the license of llama model weights. We need to figure it out with Meta's llama team before releasing.

link

jart 1179 days ago

Hi. I wrote the weights file format that llama.cpp uses, as of yesterday https://github.com/ggerganov/llama.cpp/pull/613 What can I do to assist you getting these deltas ready?

link

number6 1179 days ago

Financial and Political would be my guess. But maybe I just want to tease out an answer...

link

nenkoru 1179 days ago

It would be great if you can help me with this PR as well as adding a support for exporting a model that was quantized using GPTQ, bitsandbytes, plain torch. This would bring a lot of benefit from both worlds:

- Low memory footprint(thanks quantization)

- Fast inference(thanks io binding)

Particularly in case of alpaca I have seen a 5x decrease in latency on A100 and 10x on AMD EPYC. I believe this is the way for users to have an AI that could genereate a response as fast as it can on their hardware. I have also added a link to my profile on hf with small alpacas turned into ONNX format. Take a look into them.

[1] https://github.com/huggingface/optimum/pull/922

[2] https://huggingface.co/nenkoru

link

imjonse 1179 days ago

Has LoRA been considered as possible alternative for finetuning on your dataset? In that case releasing the 'diff' against the LLaMA weights would be simpler to work with.

link

zhisbug 1179 days ago

Yeah that might work but this model wasn’t tuned with lora

link

luckystarr 1179 days ago

If it's based on LLaMA, aren't these weights just some sort of "patch" for the initial model, which is licensed under a restrictive license?

Or is this work "transferable" to other LLMs, once they become available?

link

shallichange 1178 days ago

Why is it not called Vicuña as it should? Vicuna does not sound the same way

link