| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sbierwagen 921 days ago
	It is interesting how persistently dominant GPT-4 is: https://twitter.com/lmsysorg/status/1735729398672716114 Off the top of my head, I can think for at least five foundation models (Llama, Claude, Gemini, Falcon, Mistral) that are all trading blows, but GPT is still a head above them and has been for a year now. Transformer LLMs are simple enough that, demonstrably, anyone with a million bucks of GPU time can make one, but they can't quite catch up with OpenAI. What's their special sauce?

6 comments

code51 921 days ago

Their special sauce is most probably the quality of data and the amount of data cleaning effort they put in.

I’m speculating here but I think Google always refrains from getting into the manual side of things. With LLMs, it became obvious so fast that data is what matters. Seeing Microsoft’s phi-2 play, I’m convinced more about this.

DeepMind understood the properties, came up with Chinchilla but DeepMind couldn’t integrate well with Google, in terms of understanding what kind of data Google should supply to increase model quality.

OpenAI put annotation/cleaning work almost right from the start. Not too familiar with this but human labor was heavily utilized to increase training data quality after ChatGPT started.

staunton 920 days ago

Indeed, making poor people in 3rd world countries rate the worst sludge of the internet for 8+h a day might backfire on your marketing... OpenAI could risk it, Google maybe doesn't want to...

blowski 920 days ago

Given that many western companies hire poor people to do all sorts of horrible work I doubt it’s that. More likely it’s to avoid suggestions of bias across their product range.

Palmik 920 days ago

This is a naive take. How do you think Google collects or collected data for their safe-search classifiers? Now that's a sludge.

Or how do you think Google evaluates search-ranking changes (or gather data for training various ad-ranking & search-ranking models).

staunton 920 days ago

I don't know. How do they?

NavinF 920 days ago

Their instructions for human raters is public info.

Overview: https://blog.google/products/search/overview-our-rater-guide...

Full PDF: https://static.googleusercontent.com/media/guidelines.raterh...

pixl97 920 days ago

I was going to make a joke about all those CAPTCHAs we've solved, but I don't have an answer here.

kccqzy 921 days ago

Their only special sauce is the first-mover advantage. Then it attracted users (data), brand recognition, talent and became a positive feedback cycle.

vitorgrs 921 days ago

GPT4 was created before most feedback cycle. They had GPT4 ready before ChatGPT launch.

If I recall right, GPT4 got done in October. After that, it was RLHF and safety work (Bing starts using GPT4 publicly in February, a month earlier than official launch)

kccqzy 921 days ago

If I recall right, before ChatGPT launched Google already had LaMDA which an employee believed to be sentient and was subsequently fired. The foundation model was definitely done, but to launch Bard, Google needed a kick in the ass in additional RLHF, safety and groundedness work.

Ultimately though, it's futile to argue which model got done first, as long as the models were behind closed doors. But ChatGPT launched before Bard did and that's the pertinent part that gave OpenAI the first-mover advantage.

dindobre 921 days ago

The LaMDA is sentient guy gave me the impression of being a bit nuts. I'm sure google would show their weight and out-compete openai if they could. We all know all this "AI safety" is for show, right?

staunton 920 days ago

> We all know all this "AI safety" is for show, right?

No. A lot of people think it really matters

A lot of other people pretend to care about it because it also enables stifling the competition and attempting regulatory capture. But it's not all of them.

snewman 920 days ago

I'm personally devoting my career to AI safety, on a volunteer basis, because I think it's is legitimately of high importance. (See my blog, e.g. https://amistrongeryet.substack.com/p/implications-of-agi, if you want to understand where I'm coming from.)

What makes you think it is for show?

y04nn 920 days ago

No, it's for brand safety and reputation. In 2016 Microsoft released Tay [1] without or lacking guards and it ended up being a failure and hurter the Microsoft brand.

[1] https://en.wikipedia.org/wiki/Tay_(chatbot)

rvnx 921 days ago

LaMDA is really far from being sentient.

It's outputs non-sensical (aka highly hallucinating) or relatively useless but coherent text.

It really needs further refinement.

This is one big reason why GPT-4 is still the most popular.

famouswaffles 921 days ago

GPT-4 was done training August 2022

vitorgrs 921 days ago

Thanks!

ben_w 921 days ago

The RHLF is probably quite important even on top of a good base model.

huytersd 921 days ago

That’s not it. It’s not just hype. The underlying model is better.

dmarchand90 920 days ago

I kinda wonder if maybe it's at least partially due to openai hitting a kind of hyperparameter lottery. When each experiment costs millions it might be that (aside from good/ unique data) they just have a good set of hyperparameters used in training and it's too expensive for a competitor to find equal or better settings

jwuphysics 920 days ago

I would be surprised if this is the case. Neural scaling laws are well known and are used by all big industry players to extrapolate experiments.

dmarchand90 920 days ago

Are they really "laws" my impression is its all just a bunch of empirical trends.

We cannot know truly how these parameters interact at large scale and also how they interact with each other.

Is it really the case that openai has data that Google doesn't?

porompompero 920 days ago

Sorry for my ignorance: why does each experiment cost millions?

Jensson 920 days ago

Because training a model costs millions, so each time you experiment with trying to create a new kind of model it costs millions.

bart_spoon 920 days ago

It’s the cost of compute hardware required to train a model of that size

summerlight 920 days ago

Beside the fact that Gemini pro is more comparable to GPT-3.5, one more interesting observation is that even OpenAI themselves was not able (or didn't intend) to deliver a significantly better model than GPT-4 almost over a year. And OpenAI does not seem to hide their own magical "AGI" behind the scene as they've been more focused on efficiency and engineering works reportedly, primarily driven by Sam, rather than developing a new model. I'm reasonably sure that the current transformer itself as an architecture is at its peak and most improvements will be mostly incremental.

dwaltrip 920 days ago

Note, Gemini Ultra, which they claim is competitive with or possibly even better than GPT-4, isn’t out yet. They have released a weaker model, Gemini Pro.

It will be interesting to see how capable Gemini Ultra actually is. For now we wait.

jazarwil 920 days ago

You cannot compare GPT 4 to Gemini Pro. They are different classes of models.