| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by PaulHoule 1199 days ago

See

https://en.wikipedia.org/wiki/Hutter_Prize

GPT-3 is said to have 175 billion parameters, if those are float32s (I bet they could get away with less than that) it would be 700 GB of data. It's also said in Wikipedia that "60% percent of the weighted pre-training dataset for GPT-3 comes from a filtered version of Common Crawl consisting of 410 billion byte-pair-encoded tokens"

That would be about 680B tokens, say the average token is 5 characters, that is 3400B characters of text, such that the output is "compressed" to 20% of the input, which state-of-the-art text compressors can accomplish.

Now my figures could be off, namely they might be coding the parameters more efficiently and the average token could be longer. But it seems to make sense that if you trained a model to capture as much information as you could possibly capture out of the text it would be that size. Given that that kind of model seems to be able to spit out what it was trained on (though sometimes garbled) that might be about right.