| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by infecto 443 days ago
	Has anyone tested googles functionality vs ChatGPT? I have lightly played around with it but felt that generally ChatGPTs implementation was a little more educated sounding and felt like it took whatever necessary persona well.

2 comments

nico 443 days ago

Just did a test last week and OpenAIs research was way better. Found 10x more sources and did an overall pretty great job

The task was to lookup information about a late distant family member who had been a prominent employee in a certain foreign government about 100 years ago

Gemini barely scratched the surface and pretty much gave up

ChatGPT on the other hand, kept building up on its research, connecting the dots and leveraging each bit of acquired information to try to find more

link

consumer451 443 days ago

Would love to see this repeated with this latest version from Google.

Man, what's really missing from all of this is a 3rd party AI Consumer Reports type site for all of these LLM tools. Whoever does this thing that does not scale will have a highly referenced site on their hands.

link

jeffbee 443 days ago

Throughout the entire 20th century the main determinant of a Consumer Reports rating for a car was whether you could put a wheelchair in the trunk. Hopefully the AI agent industry does not sprout a similarly worthless metric.

link

consumer451 443 days ago

I almost didn't use that as the comparison for their lack of rigor, but it gets the idea across.

link

shigawire 443 days ago

Isn't that what llmarena does?

link

consumer451 443 days ago

It tries to in a way that scales easily, and is also easily gamed.

I want a staff of human testers, each with domain expertise. If the goal is to replace humans, should there not be a real human metric?

I want a physicist asking their battery of physics questions, 4 different kinds of devs asking their battery of dev problems, a couple chefs asking for cooking techniques, etc.

Now on to "Deep Research," 6 different kinds of OSINT/secondary analysts who ask new problems each time, and compare it to their days of human work.

We really need this as a species, otherwise the brain dead C-Suites of the world are going to keep buying the hype, which is often very premature. This could have real consequences, and it apparently already has.

It's insane to me that we are investing, what, almost $1T into LLMs, and have not spent the ~$1.5M/year to do what I described above.

link

consumer451 442 days ago

^ I really should have used the "myopic," instead of "brain dead" to describe the C-Suites of the world. My apologies.

link

SequoiaHope 442 days ago

I suppose Consumer Reports could do it!

link

phonon 443 days ago

But this is a brand new version? Why not run it again on Gemini 2.5 Deep Research mode and report if it's better?

link

infecto 443 days ago

I do think they are leaps in front of everyone else from the product perspective and everyday its looking more to be the battleground where money is going to be made.

link

arresin 443 days ago

I haven’t used 2.5 pro just 2.0 pro. It was inferior to OpenAI (which isn’t that good).

My ranking openai > grok 3 deeper > Gemini 2.0 pro. All have been terrible for the 100 or so times I’ve used them (all SWE / finance related in some way)

link

infecto 443 days ago

Inversely we have been getting huge gains from OpenAIs implementation in our group for certain workflows related to finance deals. We don’t use it for quant work though, all qualitative research.

link