I'm in the business of mi300x. This comment nails it.
In general, the $2 GPUs are either PE venture losing money, long contracts, huge quantities, pcie, slow (<400G) networking, or some other limitation, like unreliable uptime on some bitcoin miner that decided to pivot into the GPU space and has zero experience on how to run these more complicated systems.
Basically, all the things that if you decide to build and risk your business on these sorts of providers, you "get what you pay for".
Distributed training data creating & curation is more useful and feasible. Training gets cheaper 1.5x every year, but data is just as expensive, if not more, given that the era of "free web crawls of human knowledge" is over.
I agree with you, but as the article mentioned, if you need to finetune a small/medium model you really don't need clusters. Getting a whole server with 8/16x H100s is more than enough. And I also believe with the article when it states that most companies are finetuning some version of llama/open-weights models today.
In general, the $2 GPUs are either PE venture losing money, long contracts, huge quantities, pcie, slow (<400G) networking, or some other limitation, like unreliable uptime on some bitcoin miner that decided to pivot into the GPU space and has zero experience on how to run these more complicated systems.
Basically, all the things that if you decide to build and risk your business on these sorts of providers, you "get what you pay for".