|
|
|
|
|
by rushingcreek
1029 days ago
|
|
We didn't want to use LoRA to maximize quality, so we used 32 A100-80GB with a sequence length of 4096. It's possible to do a native fine-tune on as little as 8 A100-80GB with DeepSpeed Zero 3, but it will take longer. With LoRA you can probably get away with just a few 4090s. |
|