| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by strangemonad 1022 days ago
	This argument has always felt to me like saying “google has no moat in search, they just happen to currently have the best page rank. Nothing is stopping yahoo from creating a better one”

4 comments

jdminhbg 1022 days ago

Google has a flywheel where its dominant position in search results in more users, whose data refines the search algorithm over time. The question is whether OpenAI has a similar thing going, or whether they just have done the best job of training a model against a static dataset so far. If they're able to incorporate customer usage to improve their models, that's a moat against competitors. If not, it's just a battle between groups of researchers and server farms to see who is best this week or next.

link

mbb70 1021 days ago

But that's exactly what they have: millions of high quality, rated chat interactions that no one else has.

I don't know how they could _not_ incorporate customer usage to improve their models.

link

omeze 1021 days ago

well, this assumes the chat (where the ratings are given) is what people are using and paying for. I think most businesses pay for some combination of API access and specific use cases like code generation (at least, thats what I pay for) that don't really impact RLHF data. General search for consumers is likely to schism since chatGPT isn't especially different from Bard or Edge's AI assistant or the myriad of other product surface areas that can add it.

link

zarzavat 1021 days ago

Yes the chat interactions don’t help with capability (what it can do) they only help with alignment (what it should do). And you don’t need a lot to get good results. Crowdsourcing will be enough.

link

zarzavat 1022 days ago

It’s a different situation computationally. Transformers are asymmetric: hard to train but easy to run.

There is no such thing as an open source Google because Google’s value is in its vast data centers. Search is hard to train and hard to run.

GPT4 is not that big. It’s about 220B parameters, if you believe geohot, or perhaps more if you don’t.

One hard drive.

link

shihab 1021 days ago

My understanding is that Google search is a lot more than just Pagerank (Map reduce for example). They had lots of heuristics, data, machine learning before anyone else etc.

Whereas the underlying algorithms behind all these GPTs so far are broadly same. Yes, OpenAI does probably have better data, model finetuning and other engineering techniques now, but I don't feel it's anything special that'll allow themselves to differentiate themselves from competitors in the long run.

(If the data collected from a current LLM user in improving model proves very valuable, that's different. I personally think that's not the case now but who knows).

link

ra7 1021 days ago

Google's moat in search has always been systems and data center infrastructure. You can create your own search ranking algorithm, but you can't crawl the web and serve search results to billions of worldwide users in a few milliseconds.

link

jjeaff 1021 days ago

I think it's also more than just systems and data centers. it is also difficult to scrape the web the way Google does without using Google IP addresses. a lot of the web now will block you or severely throttle you if you aren't one of the well know engines that they want indexing them.

link

colinsane 1021 days ago

> You can create your own search ranking algorithm, but you can't crawl the web and serve search results to billions of worldwide users in a few milliseconds.

rephrasing this for LLMs instead of search: "you can create your own model architecture/training method, but you can't crawl the web and serve language query results to billions of worldwide users in a few milliseconds."

that checks out, right? Google/search == """Open"""AI/LLMs still seems like a decent metaphor to me.

link