|
|
|
|
|
by hannesfur
443 days ago
|
|
In a sense: You are not wrong! But when we got started we thought it is way easier than it actually was. Procuring powerful GPUs alone is difficult, collecting proper data too. But of course you can still do everything yourself. If you want to give this a try yourself, I would recommend taking a look at torchtune (https://github.com/pytorch/torchtune). |
|
I was working at a startup doing end to end training for modified BERT architectures and everything from buying a GPU - basically impossible right now, we ended up looking at sourcing franken cards _from_ China.
To the power and heat removal - you need a large factories worth of power in the space of a small flat.
To pre-training something that's not been pre-trained before - say hello to throwing out more than 80% of pretraining runs because of a novel architecture.
Was designed to burn money as fast as possible.
Without hugely deep pockets, with a contract from NVidia, and with a datacenter right next to a nuclear power plant you can't compete at the model level.