Hacker News new | ask | show | jobs
by turblety 1177 days ago
> In our first release, we will share the training, serving, and evaluation code. We plan to release the model weights by providing a version of delta weights that build on the original LLaMA weights, but we are still figuring out a proper way to do so. Join our Discord server and follow our Twitter to get the latest updates.

Please correct me if I'm wrong, but it seems like this is not actually an open-source model? As in, I can't download and run this on my own laptop?

> The cost of training Vicuna-13B is around $300. The training and serving code, along with an online demo, are publicly available for non-commercial use.

However, if this is true, it seems pretty cool if I can train a GPT4-like model for only $300. Is this as easy as cloning a repo and running it on an AWS GPU instance?

6 comments

I am a Vicuna developer. We plan to release the weights once we have addressed all concerns and have a low-resource version of the inference code ready. We released the demo first to get some early feedback on the model.
We hear a lot about "concerns" and many of us don't share the same ones... It would be good for clarity to know what are the concerns you feel are important enough to hold back releasing the weights?
It is mainly because of the legal issues caused by the license of llama model weights. We need to figure it out with Meta's llama team before releasing.
Hi. I wrote the weights file format that llama.cpp uses, as of yesterday https://github.com/ggerganov/llama.cpp/pull/613 What can I do to assist you getting these deltas ready?
Financial and Political would be my guess. But maybe I just want to tease out an answer...
It would be great if you can help me with this PR as well as adding a support for exporting a model that was quantized using GPTQ, bitsandbytes, plain torch. This would bring a lot of benefit from both worlds:

- Low memory footprint(thanks quantization)

- Fast inference(thanks io binding)

Particularly in case of alpaca I have seen a 5x decrease in latency on A100 and 10x on AMD EPYC. I believe this is the way for users to have an AI that could genereate a response as fast as it can on their hardware. I have also added a link to my profile on hf with small alpacas turned into ONNX format. Take a look into them.

[1] https://github.com/huggingface/optimum/pull/922

[2] https://huggingface.co/nenkoru

Has LoRA been considered as possible alternative for finetuning on your dataset? In that case releasing the 'diff' against the LLaMA weights would be simpler to work with.
Yeah that might work but this model wasn’t tuned with lora
If it's based on LLaMA, aren't these weights just some sort of "patch" for the initial model, which is licensed under a restrictive license?

Or is this work "transferable" to other LLMs, once they become available?

Why is it not called Vicuña as it should? Vicuna does not sound the same way
> actually an open-source model

I found Debian's Machine Learning policy on that interesting, they require training data released under an open-source license (plus training code etc) before they consider a model actually open.

https://salsa.debian.org/deeplearning-team/ml-policy/

> However, if this is true, it seems pretty cool if I can train a GPT4-like model for only $300.

Not train, but fine-tune a model that Facebook spent millions of dollars training.

They absolutely did not spend millions to train it. Credible estimates place the cost for an entity like Meta at about 30-100k, probably less since Meta likely owns the 256 A100x8s needed to train it.

Even as an individual, it wouldn't cost you anywhere near a million if you only trained 13B and took advantage of volume pricing.

I don’t think it’s fair to just ignore the capex part of the model training costs. If we take AWS pricing, the 21 days of training for 65B cited in the llama paper would cost 2.6m at reserved prices. While there’s a lot of AWS profit there, it’s a reasonable first approximation of the TCO of that hardware. Even if real TCO is a third, that’s still nearly a million to train 65B, never mind the staff costs.
Plus there's bound to be false starts, reverts, crashes, etc that bump up the actual reproduction cost. Most training cost estimations take an extremely rosy best-case view assuming everything goes smoothly on the first try and no gpu cycles were wasted.
Could I get a source for that? Not that I don't believe you, but my napkin math puts the cost of training the 65b parameter model alone at a lot higher than 100k.
It's not GPT-4 like. If you see the tests it's not even GPT-3.5 like. They just used GPT-4 for evaluation.
Anyone know exactly config is needed for training, presumably the $300 is on some GPU heavy instance on ec2? Is $300 of Ec2 p4de.24xlarge which is $40.96 an hour? Or maybe 7-8 nodes for an hour? Something else?
From the post:

> The training was done with PyTorch FSDP on 8 A100 GPUs in one day.

> We employ SkyPilot managed spot to reduce the cost by leveraging the cheaper spot instances with auto-recovery for preemptions and auto zone switch. This solution slashes costs for training the 7B model from $500 to around $140 and the 13B model from around $1K to $300.

So, this is using for example a2-ultragpu-8g (8x A100-80GB) on GCP using spot instances. You can use SkyPilot to quickly see the price is $12.8 per hour (~$307 for a day):

» sky launch --gpus A100-80GB:8 --use-spot

Check out detailed CLI instructions and SkyPilot YAMLs here if you want to give it a try:

- https://github.com/lm-sys/FastChat#vicuna

- https://github.com/lm-sys/FastChat/blob/main/scripts/train-v...

What’s wrong with vast.ai? You’d be probably looking at $2/hr. So like $50 for the whole fine-tuning.
Their mention of SkyPilot isn't an accident, it seems to be a "find me cheap spot instances" project: https://github.com/skypilot-org/skypilot#readme and as best I can tell that's what the yaml files in their repo are for: https://github.com/lm-sys/FastChat/blob/main/scripts/train-v...
Train it on what?

I want host our own AI, have it injest our docs, and answer questions about them. What is the best tool today to do that?

You best bet (today) is actually to not train a model, but instead use a model but connect it to your docs via vector search. For example: https://python.langchain.com/en/latest/use_cases/question_an...
I heard about something like that, https://gerev.ai

They are working on exactly this, and it's even open source and self hosted I guess..

Probably want to check out https://www.kapa.ai/, no affiliation, just a fan.
GPT4 + retrieval might be the fastest path. But quality not guaranteed, and assuming you do not mind uploading all your private info to openAI.

This project might be the best option where you can finetune an LLM on your data and keep the model yourself

He asked specifically for self hosted solutions.