Hacker News new | ask | show | jobs
by nullc 1040 days ago
How do these goals set out the needed ratios of memory to compute to interconnect bandwidth?

An ideal machine designed to train GPT4 in a day is likely very different to the ideal machine to train 50 GPT4s as once over a few weeks, which is very different from the ideal machine to train a model 100x bigger than GPT4 (perhaps the most interesting).