| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by emmender2 811 days ago

this proves that all llm models converge to a certain point when trained on the same data. ie, there is really no differentiation between one model or the other.

Claims about out-performance on tasks are just that, claims. the next iteration of llama or mixtral will converge.

LLMs seem to evolve like linux/windows or ios/android with not much differentiation in the foundation models.

10 comments

jobigoud 811 days ago

It's even possible they converge when trained on different data, if they are learning some underlying representation. There was recent research on face generation where they trained two models by splitting one training set in two without overlap, and got the two models to generate similar faces for similar conditioning, even though each model hadn't seen anything that the other model had.

IshKebab 811 days ago

That sounds unsurprising? Like if you take any set of numbers, randomly split it in two, then calculate the average of each half... it's not surprising that they'll be almost the same.

If you took two different training sets then it would be more surprising.

Or am I misunderstanding what you mean?

MajimasEyepatch 811 days ago

It doesn't really matter whether you do this experiment with two training sets created independently or one training set split in half. As long as both are representative of the underlying population, you would get roughly the same results. In the case of human faces, as long as the faces are drawn from roughly similar population distributions (age, race, sex), you'll get similar results. There's only so much variation in human faces.

If the populations are different, then you'll just get two models that have representations of the two different populations. For example, if you trained a model on a sample of all old people and separately on a sample of all young people, obviously those would not be expected to converge, because they're not drawing from the same population.

But that experiment of splitting one training set in half does tell you something: the model is building some sort of representation of the underlying distribution, not just overfitting and spitting out chunks of copy-pasted faces stitched together.

evrial 810 days ago

That's explanation of central limit theorem in statistics. And any language is mostly statistics and models are good at statistical guessing of the next word or token.

taneq 811 days ago

If not are sampled from the same population then they’re not really independent, even if they’re totally disjoint.

evrial 810 days ago

They are sourced mostly from the same population and crawled from everything can be crawled.

Tubbe 811 days ago

Got a link for that? Sounds super interesting

d_burfoot 811 days ago

https://en.wikipedia.org/wiki/Theory_of_forms

bobbylarrybobby 811 days ago

I mean, faces are faces, right? If the training data set is large and representative I don't see why any two (representative) halves of the data would lead to significantly different models.

arcticfox 811 days ago

I think that's the point; language is language.

If there's some fundamental limit of what type of intelligence the current breed of LLMs can extract from language, at some point it doesn't matter how good or expansive the content of the training set is. Maybe we are finally starting to hit an architectural limit at this point.

dumbfounder 811 days ago

But information is not information. They may be able to talk in the same style, but not about the same things.

swalsh 811 days ago

The models are commodities, and the API's are even similar enough that there is zero stickiness. I can swap one model for another, and usually not have to change anything about my prompts or rag pipelines.

For startups, the lesson here is don't be in the business of building models. Be in the business of using models. The cost of using AI will probably continue to trend lower for the foreseeable future... but you can build a moat in the business layer.

spxneo 811 days ago

Excellent comment. Shows good awareness of economic forces at play here.

We are just going to use whatever LLM is best fast/cheap and the giants are in an arms race to deliver just that.

But only two companies in this epic techno-cold war have an economic moat but the other moat is breaking down inside the moat of the other company. The moat inside the moat cannot run without the parent moat.

rayval 811 days ago

Intriguing comment that I don't quite follow. Can you please elaborate?

stolsvik 810 days ago

Probably OpenAI running on Azure. But it was still convoluted.

stri8ed 811 days ago

Or be in the business of building infrastructure for AI inference.

cheselnut 811 days ago

Is this not the same argument? There are like 20 startups and cloud providers all focused on AI inference. I'd think application layer receives the most value accretion in the next 10 years vs AI inference. Curious what others think

sparks1970 811 days ago

Or be in the business of selling .ai domain names.

sroussey 811 days ago

Embeddings are not interchangeable. However, you can setup your system to have multiple embeddings from different providers for the same content.

jimmySixDOF 811 days ago

There are people who make the case for custom fine tuned embedding models built to match your specific types of data and associations. Whatever you use internally it gets converted to the foundation model of choice's formats by their tools on the edge. Still Embeddings and the chunking strategies feeding into them are both way too underappreciated parts of the whole pipeline.

swalsh 811 days ago

Embeddings are indeed sticky, I was referring to the LLM model itself.

esafak 810 days ago

That's not what investors believe. They believe that due to training costs there will be a handful of winners who will reap all the benefits, especially if one of them achieves AGI. You can tell by looking at what they've invested most in: foundation models.

phillipcarter 810 days ago

I don't think I agree with that. For my work at least, the only model I can swap with OpenAI and get similar results is Claude. None of the open models come even close to producing good outputs for the same prompt.

n2d4 811 days ago

There's at least an argument to be made that this is because all the models are heavily trained on GPT-4 outputs (or whatever the SOTA happens to be during training). All those models are, in a way, a product of inbreeding.

fragmede 811 days ago

But is it the kind of inbreeding that gets you Downs, or the kwisatz haderach?

batshit_beaver 810 days ago

Yes

pram 811 days ago

Consider the bulldog: https://youtube.com/watch?v=hUgmkCgMWbg

sumo43 811 days ago

Maybe true for instruct, but pretraining datasets do not usually contain GPT-4 outputs. So the base model does not rely on GPT-4 in any way.

mnemoni_c 811 days ago

Yea it feels like transformer LLMs are in or getting closer to diminishing returns. Will need some new breakthrough, likely entirely new approach, to get to AGI levels

Tubbe 811 days ago

Yeah, we need radically different architecture in terms of the neural networks, and/or added capabilities such as function calling and RAG to improve the current sota

mattsan 811 days ago

can't wait for LLMs to dispatch field agent robots who search for answers in the real world thats not online /s

htrp 811 days ago

skynet would like a word

throwaway74432 811 days ago

LLMs are a commodity

https://www.investopedia.com/terms/c/commodity.asp

paxys 811 days ago

Maybe, but that classification by itself doesn't mean anything. Gold is a commodity, but having it is still very desirable and valuable.

Even if all LLMs were open source and publicly available, the GPUs to run them, technical know how to maintain the entire system, fine tuning, the APIs and app ecosystem around them etc. would still give the top players a massive edge.

throwaway74432 811 days ago

Of course realizing that a resource is a commodity means something. It means you can form better predictions of where the market is heading, as it evolves and settles. For example, people are starting to realize that these LLMs are converging on fungible. That can be communicated by the "commodity" classification.

YetAnotherNick 811 days ago

Even in the most liberal interpretation of prove, it doesn't do that. GPT-4 was trained before OpenAI has any special data or deal with microsoft or the product market fit. Yet, no model has beaten it in a year. And google, microsoft, meta definitely have better data and more compute.

gerash 811 days ago

The evaluations are not comprehensive either. All of them are improving and you can't expect any of them to hit 100% on the metrics (a la. bayes error rate). It gets increasingly difficult to move the metrics as they get better.

falcor84 811 days ago

> this proves that all llm models converge to a certain point when trained on the same data

They are also all trained to do well on the same evals, right? So doesn't it just boil down to neural nets being universal function approximators?

bevekspldnw 811 days ago

The big thing for locally hosted is inference efficiency and speed. Mistral wears that crown by a good margin.

crooked-v 811 days ago

Of course, part of this is that a lot of LLMs are now being trained on data that is itself LLM-generated...