| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lelanthran 113 days ago

I disagree that the model is a moat; distillation of models is going to happen, and even without it all the current players have models that are virtually indistinguishable for the use-case.

Model capbilities have converged over time, and I don't see this trend reversing. OpenAI owns only the model.

The provider who does have a moat is Google - they own the entire vertical, from the hardware, to the training data, they have it all.

OpenAI has to buy GPUs, Google makes them.

OpenAI has to rent data centers. Google owns them.

OpenAI has to scrape the web for all training data. Google's collection of user emails (not counting their Android data harvesting, ad data harvesting user-tracking, etc) alone gives them a ton of training data which will never be available to scrapers.

Google has billions of signed-in users, OpenAI has to market to and attract users (800m user count last I checked, but also last I checked that growth was asymptotic and flattening out).

Thats what a moat looks like. Better technology and/or results has never been, in my memory, a moat.

1 comments

energy123 113 days ago

Good points about Google.

I think where I don't agree is about the model. You're mostly correct right now, and your view is supported by how close everyone is.

Where I am more optimistic about the 2-4 biggest labs (not just OpenAI) is what the next 2 years looks like.

I expect this to happen:

- Synthetic data goes from 30% of training data to 90-97%+ of training data.

- Synthetic data becomes hugely varied, and the production of it is factory-like and parallelized.

The moat here is the data factory, and the scale/scope economies behind it.

Thoughts?

lelanthran 113 days ago

Look, I'm upvoting your posts in this thread because you make some good points, but I'm not really convinced that a) synthetic data will result in good models, nor that b) quality synthetic data can be generated by labs outside of those orgs that have a ton of user-info.

This is why I say that OpenAI has no moat - even if synthetic data (however it is generated) is 90% of training data, there are still only two possibilities:

1. Orgs like Google, Microsoft and Amazon have a ton of user-data with which to produce synthetic data (after all, it's not produced out of thin air).

and

2. You don't need a ton of real data to seed the synthetic generation.

In the first case, yes, that looks like a moat, but not for OpenAI, more like for Google, etc al.

In the second case, what's to stop an upstart from producing their own synthetic training data?

In either case, companies who provide only tokens (OpenAI, Anthropic, etc) don't have a moat. The moat is still the same as it was in the 90s - companies deeply embedded into users' workflows.

In my memory, like I said, I struggle to think of even a few successful moats that were technology. The moat is always something else.