| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sacred_numbers 1139 days ago
	I would bet money against that. Replicating GPT-4 pre-training with current hardware would cost about 40-50m in compute. Compute will continue to decrease in cost and algorithmic improvements may allow for more efficient training, but probably not 3 orders of magnitude in a few years. I think there will be plenty of open source models that will claim GPT-4 quality, and some of them will be close, but they will be models that used millions of dollars (probably from some corporate benefactor but possibly from crowdsourcing) in compute to train. You will probably be able to fine-tune and run inference on fairly cheap hardware, but you can't cheat scale. It's going to take a major innovation to move away from the expensive base model paradigm.

3 comments

p1esk 1139 days ago

Replicating GPT-4 pre-training with current hardware would cost about 40-50m in compute.

Source? My educated guess it’s somewhere between 10 to 100 times cheaper than that.

link

sacred_numbers 1138 days ago

I did my own calculations based on plotting loss on benchmarks compared to models with known parameters and training data, as well as using a quote from Sam Altman that said that GPT-4 would not use very many more parameters than GPT-3. Based on this, I estimated that GPT-4 probably used about 250B parameters, and since I had an estimate for the total compute I was able to estimate that the training data was about 15T tokens. 250B parameters times 15T tokens times 6 (https://medium.com/@dzmitrybahdanau/the-flops-calculus-of-la...) means the compute was about 2.2510^25 FLOPs. I estimated that A100s cost about $1/hr and can process about 5.410^17 FLOPs at 50% efficiency per hour. Therefore, the compute cost would be (2.2510^25)/(5.410^17) or about $40 million.

Interestingly, my own calculations lined up pretty well with this calculation, although they approached the problem from a different direction (a leak by Morgan Stanley about how many GPUs OpenAI used to train GPT-4 as well as an estimate of how long it was trained): https://colab.research.google.com/drive/1O99z9b1I5O66bT78r9S...

Sam Altman has also stated that GPT-4 cost more than $100 million to train, and replication can cost 2-4x less compute. https://www.wired.com/story/openai-ceo-sam-altman-the-age-of...

If you know of an organization that can replicate GPT-4 for $400k to $4m I would love to know so that I can invest in them.

link

p1esk 1138 days ago

Let's go through your guesstimates one by one:

1. We don't know what the number of parameters is, could be 175B, could be 250B, could be 400B. Ok, let's stick with 250B.

2. Training data: GPT-3 was trained on 300B tokens. It already used most of the high-quality data available on the internet, but let's say they somehow managed to find and prepare three times as much high quality data for GPT-4. This means GPT-4 was trained on about 1T tokens.

3. 5.4e+17 FLOPs/hour means 150TFlops, which is half of the BFLOAT16 max theoretical output, sounds reasonable.

4. $1/A100/hr is reasonable.

OK, so we need to divide your cost estimate by a factor of 15: Total cost to train GPT-4 comes out to be around $2.7M.

Regarding Altman's statement about "more than 100M to train GPT-4" - I'm pretty sure he was talking about the total cost to develop GPT-4, which includes a lot of experimentation and exploration, many training runs, and many other administrative costs which are not relevant to the cost of a single training run to reproduce the existing results. Just salaries alone: ~200 people worked on GPT-4 for let's say half a year, at $400k/year: 0.5 * 400k * 200 = $40M.

link

rl3 1138 days ago

>Source? My educated guess it’s somewhere between 10 to 100 times cheaper than that.

Actually:

https://www.wired.com/story/openai-ceo-sam-altman-the-age-of...

At the MIT event, Altman was asked if training GPT-4 cost $100 million; he replied, “It’s more than that.”

Granted, OP did say pre-training.

link

RelativeDelta 1139 days ago

Especially if you consider that as compute costs decrease, so does the ability of scale players to process larger datasets.

If we extrapolate that relation, you eventually reach a point where the biggest player can collect and process the most information and produce an ever-evolving model to maintain that relation.

Better hope it's creators have your best interests at heart.

link

polski-g 1137 days ago

What happens when people can network their GPU together (like SETI at home) and a group of thousands of consumers can train GPT-5000?

link