The amount of capital needed to train these high-quality models is eye watering (not to mention the costs needed to acquire the data). Does anyone know of any well capitalized startups exploring this space?
It's not like that many people are opening 40 room hotels either. Such amounts are atypical within programming and CS communities.
A more relevant example is video games, imagine if the only viable ones were top end AAA games whose completed versions could only be accessed by cloud gaming?
I would not say that. Facebook, Microsoft and Google release plenty of useful models. EleutherAI have released 6 billion and 20 billion parameter language models. Huggingface has been training a 176B model [1].
The issue isn't a lack of models or data, it's that larger models are impossible to train without paying hundreds of thousands to millions of dollars. The hardware requirements for simply running the models already prices it out of reach for most.
These models are rather powerful but the immediate future is one of accessing them by cloud services. GeForce GTX 1080 Ti was 5 years ago, since then memory has roughly doubled in consumer GPUs. To run the highest end models on single GPUs, HW will need to 20x to 70x in memory at the same time as serious gains in flops/Joule.
I suppose improvements in CPU parallelism and RAM speeds will also go a long way towards making such models runnable on reasonable consumer hardware, albeit at slower speeds.
Saying people lack the equipment to run them for inference isn't a good reason to not publish them. The astronomical training cost is a good reason to publish them.
The data here is effectively free. I don't think they would exhaust The Pile, which you can download for free. This is also true for text2image models like DALL-E 2: while OA may have invested in its own datasets, everyone else can just download LAION-400M (or if they are really ambitious, LAION-5B https://laion.ai/laion-5b-a-new-era-of-open-large-scale-mult... ).
OpenAI would be the best example. However these large language models also have limited business value today, making an startup a speculative bet that the team will beat Google/FB/AI/Academics at making a language model and find a viable business model for the resulting model.
I'd take one of those bets or the other, both are tough to pull off. Considering that the first task of such a startup would be to hand ~100-500MM to a hardware or cloud vendor I'd be hesitant to invest as an investor.
It costs less than 10M to train. Why hand so much to hardware or cloud? Soon enough there will be open source GPT-3's, at least two are in training as we speak (BigScience and EleutherAI).
> these large language models also have limited business value today
The Instruct version of GPT-3 has become very easy to steer with just a task description. It can do so many tasks so well it's crazy. Try some interactions with the beta API.
I believe GPT-3 is already above average human level at cognitive tasks that fit in a 4000 token window. In 2-3 years I think all developers will have to adapt to the new status quo.
Gut feeling based on playing with it. Here's an example:
> Colorless green ideas sleep furiously,
and other grammatical nonsense by Noam Chomsky
He was a man without a country,
A linguist without a language,
A mind without a thought,
A dream without a dreamer.
He was lost in a world of words,
A world where ideas slept furiously,
And grammar was a never-ending nightmare.
But he persevered,
For he knew that language was the key to understanding the world.
And so he continued to study,
To learn all that he could,
In the hopes that one day,
He would find his way home.
You should check out the post we're commenting on, it has graphs for this exact metric.
Spoiler: Google's model with 3x the parameters does pass average human in a couple categories, but not at all. I don't think GPT-3 does in any.
It's doubly puzzling to me because you have access and are asserting it feels like an average human to you. It's awesome and it does magical stuff, I use it daily both for code and prose. It also majorly screws up sometimes. It only at an average human level if we play word games with things like "well, the average human wouldn't know the Dart implementation of the 1D gaussian function. Therefore it's better than the average human."
Ok, your phrasing made it sound like some article or material had convinced you of this opinion on my first reading, now I understand.
This is kind of my point about 80 links though - you're using a definition of "cognitive tasks" that more closely resembles knowledge, and then you're letting your personal feelings about profundity guide your conclusions on said cognition.
I don't deny that the machine can output pretty words and has a breadth of knowledge to put us each to shame on some simple queries, but "cognition in a 4000 token window" is an incredibly large place and I don't even understand how you would be able to claim a machine has above-human-average cognition based solely on your own interactions... That's a pretty crazy leap.
PS: I saw the downvotes, I was downvoted for questioning the validity of information that was actually just pure conjecture, be better with your votes
I agree 100%, but I think viable businesses will begin to emerge especially as these large models move from text to images (and eventually to video and 3d models). If the examples shown of DALL-E 2 are indicative of its quality, then a large number of creative jobs could be replaced with a single "creative director" using the model. But the high entry cost just to attempt to train such a model will likely remain a hurdle until more business value is proven.
aye - I suspect the other concern is hat the high entry costs can quickly lead to a "second mover" advantage. The first team spends all the money doing the hard R&D and the second team implements a slightly better version for a fraction of the money.
It's relative. It would cost more to open a 40 room hotel (about 320k/room), and hotels can't be copied like software.