This argument has always felt to me like saying “google has no moat in search, they just happen to currently have the best page rank. Nothing is stopping yahoo from creating a better one”
Google has a flywheel where its dominant position in search results in more users, whose data refines the search algorithm over time. The question is whether OpenAI has a similar thing going, or whether they just have done the best job of training a model against a static dataset so far. If they're able to incorporate customer usage to improve their models, that's a moat against competitors. If not, it's just a battle between groups of researchers and server farms to see who is best this week or next.
well, this assumes the chat (where the ratings are given) is what people are using and paying for. I think most businesses pay for some combination of API access and specific use cases like code generation (at least, thats what I pay for) that don't really impact RLHF data. General search for consumers is likely to schism since chatGPT isn't especially different from Bard or Edge's AI assistant or the myriad of other product surface areas that can add it.
Yes the chat interactions don’t help with capability (what it can do) they only help with alignment (what it should do). And you don’t need a lot to get good results. Crowdsourcing will be enough.
My understanding is that Google search is a lot more than just Pagerank (Map reduce for example). They had lots of heuristics, data, machine learning before anyone else etc.
Whereas the underlying algorithms behind all these GPTs so far are broadly same. Yes, OpenAI does probably have better data, model finetuning and other engineering techniques now, but I don't feel it's anything special that'll allow themselves to differentiate themselves from competitors in the long run.
(If the data collected from a current LLM user in improving model proves very valuable, that's different. I personally think that's not the case now but who knows).
Google's moat in search has always been systems and data center infrastructure. You can create your own search ranking algorithm, but you can't crawl the web and serve search results to billions of worldwide users in a few milliseconds.
I think it's also more than just systems and data centers. it is also difficult to scrape the web the way Google does without using Google IP addresses. a lot of the web now will block you or severely throttle you if you aren't one of the well know engines that they want indexing them.
> You can create your own search ranking algorithm, but you can't crawl the web and serve search results to billions of worldwide users in a few milliseconds.
rephrasing this for LLMs instead of search: "you can create your own model architecture/training method, but you can't crawl the web and serve language query results to billions of worldwide users in a few milliseconds."
that checks out, right? Google/search == """Open"""AI/LLMs still seems like a decent metaphor to me.