| Let's go through your guesstimates one by one: 1. We don't know what the number of parameters is, could be 175B, could be 250B, could be 400B. Ok, let's stick with 250B. 2. Training data: GPT-3 was trained on 300B tokens. It already used most of the high-quality data available on the internet, but let's say they somehow managed to find and prepare three times as much high quality data for GPT-4. This means GPT-4 was trained on about 1T tokens. 3. 5.4e+17 FLOPs/hour means 150TFlops, which is half of the BFLOAT16 max theoretical output, sounds reasonable. 4. $1/A100/hr is reasonable. OK, so we need to divide your cost estimate by a factor of 15: Total cost to train GPT-4 comes out to be around $2.7M. Regarding Altman's statement about "more than 100M to train GPT-4" - I'm pretty sure he was talking about the total cost to develop GPT-4, which includes a lot of experimentation and exploration, many training runs, and many other administrative costs which are not relevant to the cost of a single training run to reproduce the existing results. Just salaries alone: ~200 people worked on GPT-4 for let's say half a year, at $400k/year: 0.5 * 400k * 200 = $40M. |