Hacker News new | ask | show | jobs
by p1esk 1137 days ago
Let's go through your guesstimates one by one:

1. We don't know what the number of parameters is, could be 175B, could be 250B, could be 400B. Ok, let's stick with 250B.

2. Training data: GPT-3 was trained on 300B tokens. It already used most of the high-quality data available on the internet, but let's say they somehow managed to find and prepare three times as much high quality data for GPT-4. This means GPT-4 was trained on about 1T tokens.

3. 5.4e+17 FLOPs/hour means 150TFlops, which is half of the BFLOAT16 max theoretical output, sounds reasonable.

4. $1/A100/hr is reasonable.

OK, so we need to divide your cost estimate by a factor of 15: Total cost to train GPT-4 comes out to be around $2.7M.

Regarding Altman's statement about "more than 100M to train GPT-4" - I'm pretty sure he was talking about the total cost to develop GPT-4, which includes a lot of experimentation and exploration, many training runs, and many other administrative costs which are not relevant to the cost of a single training run to reproduce the existing results. Just salaries alone: ~200 people worked on GPT-4 for let's say half a year, at $400k/year: 0.5 * 400k * 200 = $40M.