New Gemini model significantly outperforms others on Chatbot Arena (LMSYS) | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	New Gemini model significantly outperforms others on Chatbot Arena (LMSYS) (lmarena.ai)
	110 points by zopper 566 days ago

6 comments

impulser_ 566 days ago

Based on my testing, this model is significantly better than other Gemini models especially with programming/math related tasks. The current Gemini models are pretty useless for anything related to programming/math, but this experiment model puts Gemini ahead of GPT4o, and pretty close to Claude 3.5.

The major problem with Claude 3.5 is you can't have conversation with a large amount of text because you will constantly hit rate limits and it's very annoying.

This model with a 2 million context window is probably the best model right now for programming.

Alifatisk 565 days ago

I wish I knew about Google Studio way earlier, I don't understand why Google haven't marketed it? I found out about it through word of mouth.

chenxi9649 566 days ago

I feel like it's at the point where I'm not too sure how these rankings impact the my choice of LLM. Every time a new model tops the charts, I'll try them for a bit and go back to claude-3.5-sonnet. Both for coding and day to day questions.

I don't know if I'm just getting used to the claude style of response, or the orangy UI that I kind of find cozy, but I think we need better ways to convey the difference between models.

maxglute 565 days ago

>orangy UI that I kind of find cozy

Yeah it is strangle cozy. I also can't disassociate claude from jean claude van damme and it make giggle thinking he is helping me code.

wkat4242 564 days ago

I also think it's really cool how he parodies himself. Most of the other martial arts actors from the 80s take themselves way too seriously now, like Steven Seagal who just phones it in in B-movies. Jean Claude van Johnson was awesome.

Alifatisk 565 days ago

Claude has been my got to, mainly because of the huge context window. But today, that doesn't seem to be the case, or you hit the rate limit pretty quickly and have to wait a whole day.

Google Studio with it's 2M context window + this experimental version could be a good replacement.

a2128 563 days ago

I would bet that Google will also add rate limits once they've burned enough money to attract users

leobg 566 days ago

Google has one moat that is often being overlooked: Googlebot. They get to scrape content that is invisible to pretty much every other crawler, thanks to Cloudflare and paywalls.

stormfather 565 days ago

And they have the absolutely massive advantage of being able to associate content with queries that led to it, and know which piece of content was selected by the user. That surely can be used in some way to give them a leg up with both choosing good training data, and making for o1 type agentic models.

leobg 559 days ago

You’re right. They can actually do RLHF just using their users. Showing each of them slightly different generations and watching their behavior.

achempion 565 days ago

Most of the content they crawl is SEO spam, I'm not sure if it's that helpful for model training

leobg 559 days ago

SEO spam is the façade you get to see as a user. The gold is all that you don’t see. Just because they don’t show it on page one doesn’t mean it won’t be useful for training.

jug 564 days ago

I feel like these are test versions of Gemini Pro 2.0. The changes are too foundational to be mere iterations/break date updates for 1.5 Pro.

ralfd 566 days ago

What is the new Gemini model? 1.5-pro-002?

alphabetting 566 days ago

Here is link to this latest one: https://aistudio.google.com/app/prompts/new_chat?model=gemin...

1.5 Pro-002 came out a couple months ago.

d4rkp4ttern 565 days ago

Where’s the info on context length etc? Can’t seem to find the official specs page.

kvn8888 564 days ago

It shows the context length on the AI Studio site

2 million for gemini-exp-1206 32k for the other experimental gemini. I think gemini-exp-1121

famouswaffles 566 days ago

Gemini Experimental 1206. It's on aistudio