| This part caught my eye: "Using a half-precision FSDP full shard with a 1024 sequence length and a micro batch size of 2 required 63GB of VRAM on each of the eight A100 80 GB GPUs. The training, lasting three epochs, took just 20 minutes. The total cost for the VM was $8.88 per hour, resulting in $3, not including the time for experiments and bug fixes." I wondered where you could rent cycles on a machine like that, a quick Google found that p4d.24xlarge on AWS is available, while the on-demand cost is $20.1755 per hour, the Spot is only $8.99 (I guess it's gone up?) Cool to know I could fine-tune for only ~$3. |