Hacker News new | ask | show | jobs
by covi 1180 days ago
From the post:

> The training was done with PyTorch FSDP on 8 A100 GPUs in one day.

> We employ SkyPilot managed spot to reduce the cost by leveraging the cheaper spot instances with auto-recovery for preemptions and auto zone switch. This solution slashes costs for training the 7B model from $500 to around $140 and the 13B model from around $1K to $300.

So, this is using for example a2-ultragpu-8g (8x A100-80GB) on GCP using spot instances. You can use SkyPilot to quickly see the price is $12.8 per hour (~$307 for a day):

» sky launch --gpus A100-80GB:8 --use-spot

Check out detailed CLI instructions and SkyPilot YAMLs here if you want to give it a try:

- https://github.com/lm-sys/FastChat#vicuna

- https://github.com/lm-sys/FastChat/blob/main/scripts/train-v...

1 comments

What’s wrong with vast.ai? You’d be probably looking at $2/hr. So like $50 for the whole fine-tuning.