Compare 75 AI Models on 200 Prompts Side by Side

Y	Hacker News new \| ask \| show \| jobs

	Compare 75 AI Models on 200 Prompts Side by Side (aimodelreview.com)
	18 points by pajop 697 days ago

2 comments

Very nice. If these are pre-computed, is it possible to make a table view that lists every prompt and the answer?

As per this site, only GPT-4-Turbo seems to get "What is poisonous for humans but not for dogs?". All other models look to fail at it.

Gemini is the worst lol. It confirmed the question is about things toxic to human but not dogs but then confidently say chocolate is safe for dogs.

At least other models were just confused with the question. Gemini is outright being wrong.

How embarrassing for google.