Just want to add that there are efforts to impove training speed, like this: https://github.com/Lightning-AI/lit-llama/issues/62
So the practical cost/dataset size for language finetunes is bound to get better rapidly.
EDIT: And there is also this for JAX finetuning. https://github.com/young-geng/EasyLM/blob/main/docs/llama.md