Issue is the data moat OAI is building. They'll have hundreds of millions of high quality user interactions with ChatGPT they can use to finetune their models. What will anyone else including Google have?
Google has been collecting user interactions since 2007 via GOOG-411, which was a precursor to the Google Assistant - I suspect Google has billions of user interactions on hand through the latter. Facebook has posts and comment, Amazon has products pages, reviews and product Q&As and all of them have billions of dollars to draw upon if they choose to buy high-quality data, or spin-up / increase teams that create and/or categorize training data.
They also have deep roster of AI researchers[1] to potentially obsolete LLMs or make fine-tuning work without access to of ChatGPT records.
1. I suspect Google alone has more AI researchers that OpenAI has employees
I'm not sure how deep that moat is. As soon as you open up the API, anyone can distil ChatGPT (or at least, some smaller part of it) by fine-tuning another model on its outputs[0].
I'm guessing that this is the #1 fear for people inside OpenAI have right now.
[0] For the record, I have zero problem with this.
1. ToS make it hard for a commercial entity to do so. So some third parties would have to collect the data first
2. You won't be able to get the hundreds of millions or more interactions that OAI will have (both due to cost of API as well as it being not easy to figure out a good way to generate that many queries for a good multiturn conversaton). Maybe you can make up for it by querying smartly. We don't know if we can right now.
As people make chat bots with openAI and tie them into existing chat services, organizations that offer these chat services will get their hands on that kind of data too.
Google has been collecting user interactions since 2007 via GOOG-411, which was a precursor to the Google Assistant - I suspect Google has billions of user interactions on hand through the latter. Facebook has posts and comment, Amazon has products pages, reviews and product Q&As and all of them have billions of dollars to draw upon if they choose to buy high-quality data, or spin-up / increase teams that create and/or categorize training data.
They also have deep roster of AI researchers[1] to potentially obsolete LLMs or make fine-tuning work without access to of ChatGPT records.
1. I suspect Google alone has more AI researchers that OpenAI has employees