|
|
|
|
|
by nullc
1040 days ago
|
|
How do these goals set out the needed ratios of memory to compute to interconnect bandwidth? An ideal machine designed to train GPT4 in a day is likely very different to the ideal machine to train 50 GPT4s as once over a few weeks, which is very different from the ideal machine to train a model 100x bigger than GPT4 (perhaps the most interesting). |
|